Notes on software developmenthttp://notes.eatonphil.com/Notes on software developmenthttp://www.rssboard.org/rss-specificationpython-feedgenenFri, 28 Feb 2025 18:07:45 +0000Minimal downtime Postgres major version upgrades with EDB Postgres Distributedhttp://notes.eatonphil.com/2025-02-28-minimal-downtime-postgres-major-version-upgrades-edb-postgres-distributed.html<head> <meta http-equiv="refresh" content="4;URL='https://www.enterprisedb.com/blog/minimal-downtime-postgres-major-version-upgrades-edb-postgres-distributed'" /> </head><p>This is an external post of mine. Click <a href="https://www.enterprisedb.com/blog/minimal-downtime-postgres-major-version-upgrades-edb-postgres-distributed">here</a> if you are not redirected.</p> http://notes.eatonphil.com/2025-02-28-minimal-downtime-postgres-major-version-upgrades-edb-postgres-distributed.htmlFri, 28 Feb 2025 00:00:00 +0000From web developer to database developer in 10 yearshttp://notes.eatonphil.com/2025-02-15-from-web-developer-to-database-developer-in-10-years.html<p>Last month I completed my first year at EnterpriseDB. I'm on the team that built and maintains <a href="https://github.com/2ndQuadrant/pglogical">pglogical</a> and who, over the years, contributed a good chunk of the logical replication functionality that exists in community Postgres. Most of my work, our work, is in C and Rust with tests in Perl and Python. Our focus these days is a descendant of pglogical called <a href="https://www.enterprisedb.com/docs/pgd/latest/">Postgres Distributed</a> which supports replicating DDL, tunable consistency across the cluster, etc.</p> <p>This post is about how I got here.</p> <h3 id="black-boxes">Black boxes</h3><p>I was a web developer from 2014-2021†. I wrote JavaScript and HTML and CSS and whatever server-side language: Python or Go or PHP. I was a hands-on engineering manager from 2017-2021. I was pretty clueless about databases and indeed database knowledge was not a serious part of any interview I did.</p> <p>Throughout that time (2014-2021) I wanted to move my career forward as quickly as possible so I spent much of my free time doing educational projects and writing about them on this blog (or previous incarnations of it). I learned how to write primitive HTTP servers, how to write little parsers and interpreters and compilers. It was a virtuous cycle because the internet (Hacker News anyway) liked reading these posts and I wanted to learn how the black boxes worked.</p> <p>But I shied away from data structures and algorithms (DSA) because they seemed complicated and useless to the work that I did. That is, until 2020 when an inbox page I built started loading more and more slowly as the inbox grew. My coworker pointed me at <a href="https://use-the-index-luke.com/">Use The Index, Luke</a> and the DSA scales fell from my eyes. I wanted to understand this new black box so I <a href="https://notes.eatonphil.com/database-basics.html">built a little in-memory SQL database</a> with support for indexes.</p> <p>I'm a college dropout so even while I was interested in compilers and interpreters earlier in my career I never dreamed I could get a job working on them. Only geniuses and PhDs did that work and I was neither. The idea of working on a database felt the same. However, I could work on little database side projects like I had done before on other topics, <a href="https://notes.eatonphil.com/tags/databases.html">so I did</a>. Or a <a href="https://notes.eatonphil.com/tags/raft.html">series of explorations</a> of Raft implementations, others' and my own.</p> <h3 id="startups">Startups</h3><p>From 2021-2023 I tried to start <a href="https://github.com/multiprocessio/datastation">a company</a> and when that didn't pan out I joined TigerBeetle as a cofounder to work on marketing and community. It was during this time I started the <a href="https://eatonphil.com/discord.html">Software Internals Discord</a> and <a href="https://www.reddit.com/r/databasedevelopment/">/r/databasedevelopment</a> which have since kind of exploded in popularity among professionals and academics in database and distributed systems.</p> <p>TigerBeetle was my first job at a database company, and while I contributed bits of code I was not a developer there. It was a <a href="https://letters.eatonphil.com/2023-01-01-letter-to-a-frontend-developer-asking-about-database-development.html">way into the space</a>. And indeed it was an incredible learning experience both on the cofounder side and on the database side. I wrote articles with King and Joran that helped teach and affirm for myself the basics of databases and consensus-based distributed systems.</p> <h3 id="holding-out">Holding out</h3><p>When I left TigerBeetle in 2023 I was still not sure if I could get a job as an actual database developer. My network had exploded since 2021 (when I started my own company that didn't pan out) so I had no trouble getting referrals at database companies.</p> <p>But my background kept leading hiring managers to suggest putting me on cloud teams doing orchestration in Go <em>around</em> a database rather than working on the database itself.</p> <p>I was unhappy with this type-casting so I held out while unemployed and continued to write posts and <a href="https://eatonphil.com/archive.html">host virtual hackweeks</a> messing with Postgres and MySQL. I started the <a href="https://eatonphil.com/2024-database-design-and-implementation.html">first incarnation</a> of the Software Internals Book Club during this time, reading Designing Data Intensive Applications with 5-10 other developers in Bryant Park. During this time I also started the <a href="https://eatonphil.com/nyc-systems-coffee-club.html">NYC Systems Coffee Club</a>.</p> <h3 id="postgres">Postgres</h3><p>After about four months of searching I ended up with three good offers, all to do C and Rust development on Postgres (extensions) as an individual contributor. Working on extensions might sound like the definition of not-sexy, but Postgres APIs are so loosely abstracted it's really as if you're working on Postgres itself.</p> <p>You can mess with almost anything in Postgres so you have to be very aware of what you're doing. And when you can't mess with something in Postgres because an API doesn't yet exist, companies have the tendency to just fork Postgres so they can. (This tendency isn't specific to Postgres, almost every open-source database company seems to have a long-running internal fork or two of the database.)</p> <h3 id="enterprisedb">EnterpriseDB</h3><p>Two of the three offers were from early-stage startups and after more than 3 years being part of the earliest stages of startups I was happy for a break. But the third offer was from <a href="https://www.enterprisedb.com/blog/Which-Companies-Supporting-PostgreSQL-Development">one of the biggest contributors</a> to Postgres, a 20-year old company called EnterpriseDB. (You can probably come up with different rankings of companies using different metrics so I'm only saying EnterpriseDB is <em>one</em> of the biggest contributors.)</p> <p>It seemed like the best place to be to learn a lot and contribute something meaningful.</p> <p>My coworkers are a mix of Postgres veterans (people who contributed the WAL to Postgres, who contributed MVCC to Postgres, who contributed logical decoding and logical replication, who contributed parallel queries; the list goes on and on) but also my developer-coworkers are people who started at EnterpriseDB on technical support, or who were previously Postgres administrators.</p> <p>It's quite a mix. Relatively few geniuses or PhDs, despite what I used to think, but they certainly work hard and have hard-earned experience.</p> <p>Anyway, I've now been working at EnterpriseDB for over a year so I wanted to share this retrospective. I also wanted to cover what it's like coming from engineering management and founding companies to going back to being an individual contributor. (Spoiler: incredibly enjoyable.) But it has been hard enough to make myself write this much so I'm calling it a day. :)</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I wrote a post about the winding path I took from web developer to database developer over 10 years. <a href="https://t.co/tf8bUDRzjV">pic.twitter.com/tf8bUDRzjV</a></p>&mdash; Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1890817374644826387?ref_src=twsrc%5Etfw">February 15, 2025</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> <p>† From 2011-2014 I also did contract web development but this was part-time while I was in school.</p> http://notes.eatonphil.com/2025-02-15-from-web-developer-to-database-developer-in-10-years.htmlSat, 15 Feb 2025 00:00:00 +0000Edit for clarityhttp://notes.eatonphil.com/2025-01-29-edit-for-clarity.html<p>I have the fortune to review a <a href="https://eatonphil.com/editor.html">few</a> important blog posts every year and the biggest value I add is to call out sentences or sections that make no sense. It is quite simple and you can do it too.</p> <p>Without clarity only those at your company in marketing and sales (whose job it is to work with what they get) will give you the courtesy of a cursory read and a like on LinkedIn. This is all that most corporate writing achieves. It is the norm and it is understandable.</p> <p>But if you want to reach an audience beyond those folks, you have to make sure you're not writing nonsense. And you, as reviewer and editor, have the chance to call out nonsense if you can get yourself to recognize it.</p> <h3 id="immune-to-nonsense">Immune to nonsense</h3><p>But especially when editing blog posts at work, it is easy to gloss over things that make no sense because we are so constantly bombarded by things that make no sense. Maybe it's buzzwords or cliches, or simply lack of rapport. We become immune to nonsense.</p> <p>And even worse, without care, as we become more experienced, we become more fearful to say "I have no idea what you are talking about". We're afraid to look incompetent by admitting our confusion. This fear is understandable, but is itself stupid. And I will trust you to deal with this on your own.</p> <h3 id="read-it-out-loud">Read it out loud</h3><p>So as you review a post, read it out loud to yourself. And if you find yourself saying "what on earth are you talking about", add that as a comment as gently as you feel you should. It is not offensive to say this (depending on how you say it). It is surely the case that the author did not know they were making no sense. It is worse to not mention your confusion and allow the author to look like an idiot or a bore.</p> <p>Once you can call out what does not make sense to you, then read the post again and consider what would not make sense to someone without the context you have. Someone outside your company. Of course you need to make assumptions about the audience to a degree. It is likely your customers or prospects you have in mind. Not your friends or family.</p> <p>With the audience you have in mind, would what you're reading make any sense? Has the author given sufficient background or introduced relevant concepts before bringing up something new?</p> <p>Again this is a second step though. The first step is to make sure that the post makes sense to <em>you</em>. In almost every draft I read, at my company or not, there is something that does not make sense to me.</p> <p>Do two paragraphs need to be reordered because the first one accidentally depended on information mentioned in the second? Are you making ambiguous use of pronouns? And so on.</p> <h3 id="in-closing">In closing</h3><p>Clarity on its own will put you in the 99th percentile of writing. Beyond that it definitely still matters if you are compelling and original and whatnot. But too often it seems we focus on being exciting rather than being clear. But it doesn't matter if you've got something exciting if it makes no sense to your reader.</p> <p>This sounds like mundane guidance, but I have reviewed many posts that were reviewed by other people and no one else called out nonsense. I feel compelled to mention how important it is.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Wrote a new post on the most important, and perhaps least done, thing you can do while reviewing a blog post: edit for clarity. <a href="https://t.co/ODblOUzB3g">pic.twitter.com/ODblOUzB3g</a></p>&mdash; Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1884735729625952692?ref_src=twsrc%5Etfw">January 29, 2025</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/2025-01-29-edit-for-clarity.htmlWed, 29 Jan 2025 00:00:00 +0000An explosion of transitive dependencieshttp://notes.eatonphil.com/2025-01-25-an-explosion-of-transitive-dependencies.html<p>A small standard library means an explosion in transitive dependencies. A more comprehensive standard library helps you minimize dependencies. Don't misunderstand me: in a real-world project, it is practically impossible to have zero dependencies.</p> <p>Armin Ronacher <a href="https://lucumr.pocoo.org/2025/1/24/build-it-yourself/">called</a> for a vibe shift among programmers and I think that this actually exists already. Everyone I speak to on this topic has agreed that minimizing dependencies is ideal.</p> <p>Rust and JavaScript, with their incredibly minimal standard libraries, <a href="https://notes.eatonphil.com/2024-03-15-zig-rust-and-other-languages.html#standard-library">work against this ideal</a>. Go, Python, Java, and C# in contrast have a decent standard library, which helps minimize the explosion of transitive dependencies.</p> <h3 id="examples">Examples</h3><p>I think the standard library should reasonably include:</p> <ul> <li>JSON, CSV, and Parquet support</li> <li>HTTP/2 support (which includes TLS, compression, random number generation, etc.)</li> <li>Support for asynchronous IO</li> <li>A logging abstraction</li> <li>A SQL client abstraction</li> <li>Key abstract data types (BTrees, hashmaps, sets, and growable arrays)</li> <li>Utilities for working with Unicode, time and timezones</li> </ul> <p>But I don't think it needs to include:</p> <ul> <li>Excel support</li> <li>PostgreSQL or Oracle clients</li> <li>Flatbuffers support</li> <li>Niche data structures</li> </ul> <p>Neither of these are intended to be complete lists, just examples.</p> <h3 id="walled-gardens">Walled gardens</h3><p>Minimal standard libraries force growing companies to build out their own internal collection of "standard libraries". As one example, Bloomberg <a href="https://github.com/bloomberg/bde/wiki">did this</a> with C++. And I've heard of companies doing this already with Rust. This allows larger companies to manage and minimize the explosion of transitive dependencies over time.</p> <p>All growing companies likely do something like this eventually. But again, smaller standard libraries incentivize companies to build this internal standard library earlier on. And the community benefits relatively little from these internal standard libraries. The community would benefit more if large organizations contributed back to an actual standard library.</p> <p>Smaller organizations do not have the capacity to build these internal standard libraries.</p> <p>Maybe the situation will lead to libraries like Boost for JavaScript and Rust programmers. That could be fine.</p> <h3 id="versioning">Versioning</h3><p>A comprehensive standard library does not prevent the language developers from releasing new versions of the standard library. It is trivial to do this with naming like Go has done with the <a href="https://go.dev/blog/v2-go-modules">v2</a> pattern. <a href="https://go.dev/blog/randv2">math/rand/v2</a> is an example.</p> <h3 id="conclusion">Conclusion</h3><p>I'm primarily thinking about maintainability, not security. You can read about the <a href="https://medium.com/@john_25313/c-isnt-a-hangover-rust-isn-t-a-hangover-cure-580c9b35b5ce#:~:text=Rust%20makes%20it,for%20their%20libraries.">security risks</a> of using a language with an ecosystem like Rust from someone who is an expert on the matter.</p> <p>My concern about the standard library does not stop me from using Rust and JavaScript. They could choose to invest in the standard library at any time. We have already begun to see <a href="https://bun.sh/docs/api/s3">Bun</a> and <a href="https://jsr.io/@std">Deno</a> to do exactly this. But it is clearly an area for improvement in Rust and JavaScript. And a mistake for other languages to avoid repeating.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">While zero dependencies is practically impossible, everyone I&#39;ve spoken to agrees that minimizing dependencies is ideal. Rust and JavaScript work against this ideal. But they could change at any time. And Bun and Deno are already examples of this.<a href="https://t.co/qkSh6oW1Yd">https://t.co/qkSh6oW1Yd</a> <a href="https://t.co/mY1MNErZG7">pic.twitter.com/mY1MNErZG7</a></p>&mdash; Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1883162142888853945?ref_src=twsrc%5Etfw">January 25, 2025</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/2025-01-25-an-explosion-of-transitive-dependencies.htmlSat, 25 Jan 2025 00:00:00 +0000Embedding Python in Rust (for tests)http://notes.eatonphil.com/2025-01-22-embedding-python-rust-tests.html<head> <meta http-equiv="refresh" content="4;URL='https://www.enterprisedb.com/blog/embedding-python-rust-tests'" /> </head><p>This is an external post of mine. Click <a href="https://www.enterprisedb.com/blog/embedding-python-rust-tests">here</a> if you are not redirected.</p> http://notes.eatonphil.com/2025-01-22-embedding-python-rust-tests.htmlWed, 22 Jan 2025 00:00:00 +0000Logical replication in Postgres: Basicshttp://notes.eatonphil.com/2025-01-17-logical-replication-postgres-basics.html<head> <meta http-equiv="refresh" content="4;URL='https://www.enterprisedb.com/blog/logical-replication-postgres-basics'" /> </head><p>This is an external post of mine. Click <a href="https://www.enterprisedb.com/blog/logical-replication-postgres-basics">here</a> if you are not redirected.</p> http://notes.eatonphil.com/2025-01-17-logical-replication-postgres-basics.htmlFri, 17 Jan 2025 00:00:00 +0000How I run a coffee clubhttp://notes.eatonphil.com/2024-12-31-how-i-run-a-coffee-club.html<p>I started the <a href="https://eatonphil.com/nyc-systems-coffee-club.html">NYC Systems Coffee Club</a> in December of 2023. It's gone pretty well! I regularly get around 20 people each month. You bring a drink if you feel like it and you hang out with people for an hour or two.</p> <p>There is no agenda, there is no speaker, there is no structure. The only "structure" is that when the circle of people talking to each other seems gets too big, I break the circle up into two smaller circles so we can get more conversations going.</p> <p><img src="/assets/coffeeclub.png" alt="/assets/coffeeclub.png"></p> <p>People tend to talk in a little circle and then move around over time. It's basically no different than a happy hour except it is over a non-alcoholic drink and it's in the morning.</p> <p>All I have to do as the organizer is periodically tell people about the <a href="https://eatonphil.com/nyc-systems-coffee-club.html">Google Form</a> to fill out. I got people to sign up to the list by posting about this on Twitter and LinkedIn. And then once a month I send an email bcc-ing everyone on the list and ask them to respond for an invite.</p> <p><img src="/assets/coffeeclub-invite.png" alt="/assets/coffeeclub-invite.png"></p> <p>The first 20 people to respond get a calendar invite.</p> <p><img src="/assets/coffee-club-invite.png" alt="/assets/coffeeclub-invite.png"></p> <p>I mention all of this because people ask how they can start a coffee club in their city. They ask how it works. But it's very simple! One of the least-effortful ways to bring together people in your city.</p> <p>If your city does not have indoor public spaces, you could use a food court, or a cafe, or a park during months where it is warm.</p> <p>For example, the <a href="https://blinsay.com/chc3/">Cobble Hill Computer Coffee Club</a> is one that meets outdoors at a park.</p> <p>Good luck! :)</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">How I run a coffee club, a short guide for others who might be interested in running one. It&#39;s very simple!<a href="https://t.co/UgRWDQOA3v">https://t.co/UgRWDQOA3v</a> <a href="https://t.co/5wYrLW7u6D">pic.twitter.com/5wYrLW7u6D</a></p>&mdash; Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1874213922271879650?ref_src=twsrc%5Etfw">December 31, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/2024-12-31-how-i-run-a-coffee-club.htmlTue, 31 Dec 2024 00:00:00 +0000Picking up volleyball in NYC with Goodrec and New York Urbanhttp://notes.eatonphil.com/2024-12-26-volleyball-in-nyc.html<p>I was so intimidated to go at first, but it is in fact easy and fun to start playing beginner volleyball in New York. The people are so friendly and welcoming that it has been easy to keep playing consistently every week since I started for the first time this August. It's been a great workout and a great way to make friends!</p> <p>The two platforms I've used to find volleyball games are <a href="https://www.goodrec.com/">Goodrec</a> and <a href="https://www.nyurban.com/">New York Urban</a>. While these platforms may also offer classes and leagues, I mostly use them to play "pickup" games. Pickup games are where you show up and join (or get assigned to) a team to play for an hour or two. Easy to go on your own or with friends.</p> <p>I'm not an expert! My only hope with this post is that maybe it makes trying out volleyball in New York feel a little less intimidating for you!</p> <h3 id="goodrec">Goodrec</h3><p>With Goodrec you have to use their mobile app. Beginner tier is called "social" on Goodrec. So browse available games until you find one at the level you want to play. You enroll in (buy a place in) sessions individually.</p> <p>Sessions are between 90-120 minutes long.</p> <p><img src="/assets/goodrec-social.png" alt="/assets/goodrec-social.png"></p> <p>They ask you not to arrive more than 10 minutes early at the gym. When you arrive you tell the gym managers (usually in a desk up front somewhere) you're there for Goodrec and the tier (in case the gym has multiple level games going on at the same time). Then you wait until the Goodrec "host" arrives and they will organize everyone into teams.</p> <p>Goodrec hosts are players who volunteer to organize the games. They'll explain the rules of the game (makes Goodrec very good for beginners) and otherwise help you out.</p> <p>Always say thank you to your host!</p> <h3 id="new-york-urban">New York Urban</h3><p>With New York Urban, pickup sessions are called <a href="https://www.nyurban.com/open-play-volleyball">"open play"</a>.</p> <p>There is no mobile app, you just use the website to purchase a spot in a session. The sessions are longer and cheaper than Goodrec. But there is no host; players self-organize.</p> <p>The options are more limited too. You play at one of four high schools on either a Friday night or on Sunday. And session slots tend to sell out much more quickly than with Goodrec.</p> <p><img src="/assets/nyurban-beginner.png" alt="/assets/nyurban-beginner.png"></p> <h3 id="big-city-volleyball">Big City Volleyball</h3><p>You can also check out <a href="https://bigcityvolleyball.com/">Big City Volleyball</a> but I haven't used it yet.</p> <h3 id="volo">Volo</h3><p>I haven't ever done Volo but I think I've heard it described as "beer league". That even some of the beginner tier sessions with Goodrec and New York Urban are more competitive.</p> <p>But also, Volo is built around leagues so you have to get the timing right. Goodrec's and New York Urban's pickup games make it easy to get started playing any time of year.</p> <h3 id="making-friends">Making friends</h3><p>It was super awkward to go at first! I went by myself. I didn't know what I was doing. I couldn't remember, and didn't know, many rules. I didn't have court shoes or knee pads.</p> <p>But the Goodrec host system is particularly great for bringing beginners in and making them feel welcome. You have a great time even if you're terrible.</p> <p>The first game I went to, I tried to hang out afterward to meet people. But people either came with their SO or with their friends or by themselves so they all just left immediately or hung out in their group.</p> <p>So you can't just go once and expect to make friends immediately. But if you keep going at the same place and time regularly week over week, you'll see familiar faces. Maybe half the people I play with each week are regulars. If you're friendly you'll start making friends with these people and eventually start going out to bars with them after the games.</p> <h3 id="improving">Improving</h3><p>Even if you find yourself embarrassingly bad at first, just keep going! I'm 29, 6'1, 190lbs and from observation the past 5 months, age, height, and weight have a very indirect relation to playing ability.</p> <p>Most of the people who play are self-taught, especially at the lower tiers I've played at. But some people played for the school team in high school or college. These people are fun to play with and you can learn a lot from them.</p> <p>Most people who are self-taught seem to watch YouTube videos like <a href="https://www.youtube.com/channel/UCoEMagRUvrXELuJZwS4DevA">Coach Donny</a>, helpful for learning how to serve, set, block, etc. Or they take "clinics" (classes) with Goodrec or other platforms. (I have no idea about these, I've never done them before.)</p> <p>At first I played 2 hours a week and I was completely exhausted after the session. Over time it got easier so I started playing 2-3 sessions a week (6-9-ish hours). With practice and consistency (after about 3-4 months), I started playing Intermediate tier with Goodrec and New York Urban. And I don't think I'll play Beginner/Social at all anymore.</p> <p>I still primarily play for fun and for the workout and to meet people. But it's also fun to get better!</p> <p>I played with one person much better than myself in an Intermediate session one time and he mentioned he will probably stop playing Intermediate and only play High Intermediate. He mentioned you get better when you keep pushing yourself to play with better and better players. Good advice!</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I wrote a little post on picking up volleyball in new york.<br><br>It&#39;s fun, and a great workout, and you meet interesting people!<a href="https://t.co/jEWHbRWF6C">https://t.co/jEWHbRWF6C</a> <a href="https://t.co/ipuIUB1ZnM">pic.twitter.com/ipuIUB1ZnM</a></p>&mdash; Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1872394142212661250?ref_src=twsrc%5Etfw">December 26, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/2024-12-26-volleyball-in-nyc.htmlThu, 26 Dec 2024 00:00:00 +00001 million page viewshttp://notes.eatonphil.com/2024-11-28-1-million-views.html<p>I was delighted to notice this morning that this site has recently passed 1M page views. And since Murat <a href="https://muratbuffalo.blogspot.com/2017/02/1-million-pageviews.html">wrote</a> about his 1M page view accomplishment at the time, I felt compelled to now too.</p> <p><img src="/assets/1m-page-views.png" alt="/assets/1m-page-views.png"></p> <p>I started regularly blogging in 2018. For some reason I decided to write a blog post every month. And while I have definitely skipped a month or two here or there, on average I've written 2 posts per month.</p> <h3 id="tooling">Tooling</h3><p>Since at least 2018 this site has been built with a static site generator. I might have used a 3rd-party generator at one point, but for as long as I can remember most of this site has been built with a <a href="https://github.com/eatonphil/eatonphil.com/blob/main/notes/scripts/build.py">little Python script</a> I wrote.</p> <p>I used to get so pissed when static site generators would pointlessly change their APIs and I'd have to make pointless changes. I have not had to make any significant changes to my build code in many years.</p> <p>I hosted the site itself on GitHub Pages for many years. But I wanted more flexibility with subdomains (ultimately not something I liked) and the ability to view server-side logs (ultimately not something I ever do).</p> <p>I think this site is hosted on an OVH machine now. But at this point it is inertia keeping me there. If you have no strong feelings otherwise, GitHub Pages is perfect.</p> <p>I used to use Google Analytics but then they shut down the old version. The new version was incredibly confusing to use. I could not find some very basic information. So I moved to Fathom which has been great.</p> <p>I used to track all subscribers in a Google Form and bcc them but this became untenable eventually after 1000 subscribers due to GMail rate limits. I currently use MailerLite for subscriptions and sending email about new posts. But this is an absolutely terrible service. They proxy all links behind a domain that adblockers hate and they also visually shorten the URL so you can't copy the text of the URL.</p> <p>I just want a service that has a hosted form for collecting subscribers and a <code>&lt;textarea&gt;</code> that lets me dump raw HTML and send that as an email to my subscribers. No branding, no watermarks, no link proxying. This apparently doesn't exist. I am too lazy to figure out Amazon SES so I stick with MailerLite for now.</p> <h3 id="evolution">Evolution</h3><p>In the beginning I talked about little interpreters in JavaScript, about programming languages, about Scheme. I was into functional programming. Over time I moved into little emulators and bytecode VMs. And for the last four years I became obsessed with databases and distributed systems.</p> <p>I have almost always written about little projects to teach myself a concept. Writing a <a href="https://notes.eatonphil.com/lua-in-rust.html">bytecode VM in Rust</a>, <a href="https://notes.eatonphil.com/emulating-amd64-starting-with-elf.html">emulating a subset of x86 in Go</a>, <a href="https://notes.eatonphil.com/2023-05-25-raft.html">implementing Raft in Go</a>, <a href="https://notes.eatonphil.com/2024-05-16-mvcc.html">implementing MVCC isolation levels in Go</a>, and so on.</p> <p>So many times when I tried to learn a concept I would find blog posts with only partial code. The post would link to a GitHub repo that, by the time I got to the post, had evolved significantly beyond what was described in the post. The repo code had by then become too complex for me to follow. So I was motivated to write minimal implementations and walk through the code in its entirety.</p> <div class="note"> Even today there is not a single post on implementing TCP/IP from scratch that walks through entirely working code. (Please, someone write this.) </div><p>I have also had a blast writing survey posts such as <a href="https://notes.eatonphil.com/2023-09-21-how-do-databases-execute-expressions.html">how various databases execute expressions</a>, <a href="https://notes.eatonphil.com/javascript-implementations.html">analyzing non-V8 JavaScript implementations</a>, <a href="https://notes.eatonphil.com/parser-generators-vs-handwritten-parsers-survey-2021.html">how various programming language implementations parse code</a>, and <a href="https://notes.eatonphil.com/whats-the-big-deal-about-key-value-databases.html">how various database systems build on top of key-value databases</a>.</p> <p>The last two posts have even each been cited in a research paper (<a href="https://arxiv.org/pdf/2208.08235">here</a> and <a href="https://www.usenix.org/system/files/atc23-kaufman.pdf">here</a>).</p> <h3 id="editing">Editing</h3><p>In terms of quality, my single greatest trick is to read the post out loud. Multiple times. Notice parts that are awkward or unclear and rewrite them.</p> <p>My second greatest trick is to ask friends for review. Some posts like <a href="https://notes.eatonphil.com/2024-02-08-an-intuition-for-distributed-consensus-in-oltp-systems.html">an intuition for distributed consensus</a> and <a href="https://notes.eatonphil.com/2024-07-01-a-write-ahead-log-is-not-a-universal-part-of-durability.html">a write-ahead log is not a universal part of durability</a> would simply not have been correct or credible without my fantastic reviewers. And I'm proud to have <a href="https://eatonphil.com/editor.html">played that part</a> a few times in turn.</p> <p>We also have a fantastic #writing-and-drafts channel on the <a href="https://eatonphil.com/discord.html">Software Internals Discord</a> where folks (myself occasionally included) come for post review.</p> <h3 id="context">Context</h3><p>I've lost count of the total number of times that these posts have been on the front page of Hacker News or that a tweet announcing a post has reached triple digits likes. I think I've had 9 posts on the front of HN this year. I do know that my single best year for HN was 12 months between 2022-2023 where 20 of my posts or projects were on the front page.</p> <p>Every time a post does well there's a part of me that worries that I've peaked. But the way to deal with this has been to ignore that little voice and to just keep learning new things. I haven't stopped finding things confusing yet, and <a href="https://notes.eatonphil.com/2024-06-14-confusion-is-a-muse.html">confusion is a phenomenal muse</a>.</p> <p>And also to, like, go out and meet friends for dinner, <a href="https://nycsystems.xyz/">run</a> <a href="https://eatonphil.com/nyc-systems-coffee-club.html">meetups</a>, run <a href="https://eatonphil.com/bookclub.html">book clubs</a>, <a href="https://eatonphil.com/chat.html">chat</a> with you fascinating internet strangers, play volleyball, and so on.</p> <p>It's always been about <a href="https://notes.eatonphil.com/2024-08-24-obsession.html">cultivating healthy obsessions</a>.</p> <h3 id="benediction">Benediction</h3><p>In parting, I'll remind you:</p> <ul> <li><a href="https://notes.eatonphil.com/is-it-worth-writing-about.html">It is definitely worth writing about</a>, whatever "it" is</li> <li><a href="https://twitter.com/eatonphil/status/1854965419745972394">You're not writing enough</a></li> <li>And <a href="https://eatonphil.com/call-for-posts.html">some ideas for posts I want to hear about if you write about them</a></li> </ul> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I wrote a little reflection on writing after noticing I passed 1M page views this morning.<a href="https://t.co/eIlMDVHNht">https://t.co/eIlMDVHNht</a> <a href="https://t.co/EKSiiDUz5G">pic.twitter.com/EKSiiDUz5G</a></p>&mdash; Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1862174926104318407?ref_src=twsrc%5Etfw">November 28, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/2024-11-28-1-million-views.htmlThu, 28 Nov 2024 00:00:00 +0000Active and influential NYC infrastructure peoplehttp://notes.eatonphil.com/2024-11-15-active-nyc-infrastructure-people.html<p>These are some of the most influential (mostly due to experience or expertise) and active folks (I actually see them attend events) in the NYC infrastructure scene (that I have a personal connection to).</p> <p>If you're running a dinner or are just looking to meet interesting people in NYC in software infrastructure, consider this list and feel free to mention "Phil said you are awesome".</p> <p>I've normalized titles a little bit but I say every title in the most generous way. These folks are brilliant.</p> <p>This list is intentionally randomized. Also not a complete list. I've surely forgotten (let alone not yet met) great folk.</p> <ul> <li><a href="https://www.linkedin.com/in/parkertimmerman/">Parker Timmerman</a>, developer</li> <li><a href="https://www.linkedin.com/in/mottaqui-karim/">Taq Karim</a>, director of engineering</li> <li><a href="https://malloc.dog/about/">Peixian Wang</a>, developer</li> <li><a href="https://www.linkedin.com/in/sujayakar/">Sujay Jayakar</a>, chief scientist</li> <li><a href="https://www.linkedin.com/in/pauldix/">Paul Dix</a>, ceo</li> <li><a href="https://www.linkedin.com/in/angelo-saraceno/">Angelo Saraceno</a>, developer</li> <li><a href="https://www.linkedin.com/in/taylor-baldwin-642b4818/">Taylor Baldwin</a>, cto</li> <li><a href="https://www.linkedin.com/in/blinsay/">Ben Linsay</a>, cto</li> <li><a href="https://www.linkedin.com/in/nicholasursa/">Nicholas Ursa</a>, developer</li> <li><a href="https://www.linkedin.com/in/samgross/">Sam Gross</a>, developer</li> <li><a href="https://www.linkedin.com/in/tramale-turner-31b24a/">Tramale Turner</a>, vp of engineering</li> <li><a href="https://www.linkedin.com/in/justinjaffray/">Justin Jaffray</a>, developer</li> <li><a href="https://www.linkedin.com/in/kwosei/">Kojo Osei</a>, vc</li> <li><a href="https://www.linkedin.com/in/bryanrussett/">Bryan Russett</a>, ceo</li> <li><a href="https://www.linkedin.com/in/guilload/">Adrien Guillo</a>, cofounder</li> <li><a href="https://www.linkedin.com/in/thiagoghisi/">Thiago Ghisi</a>, director of engineering</li> <li><a href="https://www.linkedin.com/in/gilbert-forsyth-1a368240/">Gil Forsyth</a>, developer</li> <li><a href="https://www.linkedin.com/in/dan-fried-57b0178/">Dan Fried</a>, cto</li> <li><a href="https://www.linkedin.com/in/davidagolden/">David Golden</a>, director of engineering</li> <li><a href="https://www.linkedin.com/in/akshat-bubna-188885103/">Akshat Bubna</a>, cto</li> <li><a href="https://www.linkedin.com/in/andrew-werner-8228a438/">Andrew Werner</a>, cofounder</li> <li><a href="https://www.linkedin.com/in/voberoi/">Vikram Oberoi</a>, founder</li> <li><a href="https://www.linkedin.com/in/samkottler/">Sam Kottler</a>, developer</li> <li><a href="https://www.linkedin.com/in/jordanthelewis/">Jordan Lewis</a>, director of engineering</li> <li><a href="https://www.linkedin.com/in/mykolakurutin/">Mykola Kurutin</a>, engineering manager</li> <li><a href="https://www.linkedin.com/in/paulormg/">Paulo Motta</a>, developer</li> <li><a href="https://www.linkedin.com/in/priyanka-somrah/">Priyanka Somrah</a>, vc</li> <li><a href="https://www.linkedin.com/in/jzelinskie/">Jimmy Zelinskie</a>, cpo</li> <li><a href="https://www.linkedin.com/in/vy-ton/">Vy Ton</a>, product manager</li> <li><a href="https://www.linkedin.com/in/viega/">John Viega</a>, ceo</li> <li><a href="https://www.linkedin.com/in/benburkert/">Ben Burkert</a>, cto</li> <li><a href="https://www.linkedin.com/in/petevilter/">Pete Vilter</a>, developer</li> <li><a href="https://www.linkedin.com/in/seanloiselle/">Sean Loiselle</a>, developer</li> <li><a href="https://www.linkedin.com/in/rahul-lath/">Rahul Lath</a>, vp of engineering</li> <li><a href="https://www.linkedin.com/in/kelleymak/">Kelley Mak</a>, vc</li> <li><a href="https://www.linkedin.com/in/ramrengaswamy/">Ram Kumar Rengaswamy</a>, cofounder</li> <li><a href="https://www.linkedin.com/in/oridb/">Ori Bernstein</a>, consultant</li> <li><a href="https://www.linkedin.com/in/mitchsw/">Mitch Ward</a>, director of engineering</li> <li><a href="https://www.linkedin.com/in/philippemnoel/">Philippe Noël</a>, ceo</li> <li><a href="https://www.linkedin.com/in/paulgb/">Paul Butler</a>, ceo</li> <li><a href="https://www.linkedin.com/in/mathable/">Abel Mathew</a>, cofounder</li> <li><a href="https://www.linkedin.com/in/apacker/">Andrew Packer</a>, developer</li> <li><a href="https://www.linkedin.com/in/clipperhouse/">Matt Sherman</a>, engineering manager</li> <li><a href="https://www.linkedin.com/in/seshendranalla/">Sesh Nalla</a>, director of engineering</li> <li><a href="https://www.linkedin.com/in/andrei-matei-9401083/">Andrei Matei</a>, cofounder</li> <li><a href="https://www.linkedin.com/in/ryanmwexler/">Ryan Wexler</a>, vc</li> <li><a href="https://www.linkedin.com/in/alexkesling/">Alex Kesling</a>, cto</li> <li><a href="https://www.linkedin.com/in/larrytheliquid/">Larry Diehl</a>, ceo</li> <li><a href="https://www.linkedin.com/in/will-manning-maker-of-things/">Will Manning</a>, ceo</li> <li><a href="https://www.linkedin.com/in/paul-nowoczynski-42a5267/">Paul Nowoczynski</a>, founder</li> <li><a href="https://www.linkedin.com/in/alexsarkesian/">Alex Sarkesian</a>, developer</li> <li><a href="https://www.linkedin.com/in/meganalicereynolds/">Megan Reynolds</a>, vc</li> <li><a href="https://www.linkedin.com/in/nikhilbenesch/">Nikhil Benesch</a>, cto</li> <li><a href="https://www.linkedin.com/in/saleh-hindi/">Saleh Hindi</a>, founder</li> <li><a href="https://www.linkedin.com/in/stephaniewang526/">Stephanie Wang</a>, developer</li> <li><a href="https://www.linkedin.com/in/just-be/">Justin Bennett</a>, cofounder</li> <li><a href="https://www.linkedin.com/in/evanmarkschwartz/">Evan Schwartz</a>, developer</li> <li><a href="https://www.linkedin.com/in/ekzhang/">Eric Zhang</a>, developer</li> </ul> http://notes.eatonphil.com/2024-11-15-active-nyc-infrastructure-people.htmlFri, 15 Nov 2024 00:00:00 +0000Exploring Postgres's arena allocator by writing an HTTP server from scratchhttp://notes.eatonphil.com/2024-11-06-exploring-postgress-arena-allocator-writing-http-server-scratch.html<head> <meta http-equiv="refresh" content="4;URL='https://www.enterprisedb.com/blog/exploring-postgress-arena-allocator-writing-http-server-scratch'" /> </head><p>This is an external post of mine. Click <a href="https://www.enterprisedb.com/blog/exploring-postgress-arena-allocator-writing-http-server-scratch">here</a> if you are not redirected.</p> http://notes.eatonphil.com/2024-11-06-exploring-postgress-arena-allocator-writing-http-server-scratch.htmlWed, 06 Nov 2024 00:00:00 +0000Effective unemployment and social mediahttp://notes.eatonphil.com/2024-11-05-effective-unemployment-and-social-media.html<p>Being unemployed can be incredibly depressing. So much rejection. Everything seems to be out of your control. Everything except for one thing: what you produce.</p> <p>You might know that repeatedly posting on social media that you are looking for work is ineffective. That it looks (or at least feels) worse each time you say so. But there is at least one major caveat to this.</p> <p>Every single time you create something and share it publicly is a chance to also reiterate that you are looking for work. And people actually appreciate and value this!</p> <p>Whether you write a blog post or build some project, you are seen as working on yourself and contributing to the community. Positive things! And it is no problem at all to learn with each new post you write and each new project you publish that you are also looking for work.</p> <p>Moreover, dynamics of the internet and social media basically require that you be regularly producing something new. Either regularly producing a new version of some existing project or regularly producing new projects (or blog posts) entirely.</p> <p>What you did a week ago is old news on social media. What will you do next week?</p> <p>This could itself feel depressing except for that it's probably actually a fairly healthy thing for yourself anyway! It is a motivation to keep your skills sharp as time goes on.</p> <p>So while you're unemployed and able to muster the motivation, write about things that are interesting to you! Build projects that intrigue you. Leave a little note on every post and project that you are looking for work. And share every post and project on social media.</p> <p>You'll expose yourself to opportunities and referrals. And even if no post or project "takes off" you will still be working on yourself and contributing back knowledge to the community.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I wrote a short post on some ideas for effective unemployment and social media.<a href="https://t.co/jmiJCOe2Nk">https://t.co/jmiJCOe2Nk</a> <a href="https://t.co/pK9AySNdHR">pic.twitter.com/pK9AySNdHR</a></p>&mdash; Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1853800075564109880?ref_src=twsrc%5Etfw">November 5, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/2024-11-05-effective-unemployment-and-social-media.htmlTue, 05 Nov 2024 00:00:00 +0000Checking linearizability in Gohttp://notes.eatonphil.com/2024-10-31-checking-linearizability-in-go.html<p><!-- -*- mode: markdown -*- --></p> <p>You want to check for strict consistency (<a href="https://jepsen.io/consistency/models/linearizable">linearizability</a>) for your project but you don't want to have to <a href="https://github.com/jepsen-io/">deal with the JVM</a>. <a href="https://github.com/anishathalye/porcupine">Porcupine</a>, used by a number of real-world systems like etcd and TiDB, has you covered!</p> <p>Importantly, neither Jepsen projects nor Porcupine can <em>prove</em> linearizability. They can only help you <em>build confidence</em> that you aren't obviously <em>violating</em> linearizability.</p> <p>The Porcupine README is pretty good but doesn't give complete working code, so I'm going to walk through checking linearizability of a distributed register. And then we'll tweak things a bit by checking linearizability for a distributed key-value store.</p> <p>But rather than implementing a distributed register and implementing a distributed key-value store, to keep this post concise, we're just going to imagine that they exist and we'll come up with some example histories we might see.</p> <p>Code for this post can be found on <a href="https://github.com/eatonphil/linearizability-playground">GitHub</a>.</p> <h3 id="boilerplate">Boilerplate</h3><p>Create a new directory and <code>go mod init lintest</code>. Let's add the imports we need and a helper function for generating a visualization of a history, in <code>main.go</code>:</p> <div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span> <span class="kn">import</span><span class="w"> </span><span class="s">&quot;os&quot;</span> <span class="kn">import</span><span class="w"> </span><span class="s">&quot;log&quot;</span> <span class="kn">import</span><span class="w"> </span><span class="s">&quot;github.com/anishathalye/porcupine&quot;</span> <span class="kd">func</span><span class="w"> </span><span class="nx">visualizeTempFile</span><span class="p">(</span><span class="nx">model</span><span class="w"> </span><span class="nx">porcupine</span><span class="p">.</span><span class="nx">Model</span><span class="p">,</span><span class="w"> </span><span class="nx">info</span><span class="w"> </span><span class="nx">porcupine</span><span class="p">.</span><span class="nx">LinearizationInfo</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">file</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">CreateTemp</span><span class="p">(</span><span class="s">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;*.html&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="s">&quot;failed to create temp file&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">porcupine</span><span class="p">.</span><span class="nx">Visualize</span><span class="p">(</span><span class="nx">model</span><span class="p">,</span><span class="w"> </span><span class="nx">info</span><span class="p">,</span><span class="w"> </span><span class="nx">file</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="s">&quot;visualization failed&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">&quot;wrote visualization to %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">file</span><span class="p">.</span><span class="nx">Name</span><span class="p">())</span> <span class="p">}</span> </pre></div> <h3 id="a-distributed-register">A distributed register</h3><p>A distributed register is like a distributed key-value store but there's only a single key.</p> <p>We need to tell Porcupine what the inputs and outputs for this system are. And we'll later describe for it how an idealized version of this system should behave as it receives each input; what output the idealized version should produce.</p> <p>Each time we send a command to the distributed register it will include an operation (to get or to set the register). And if it is a set command it will include a value.</p> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">registerInput</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">operation</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="c1">// &quot;get&quot; and &quot;set&quot;</span> <span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="kt">int</span> <span class="p">}</span> </pre></div> <p>The register is an integer register.</p> <p>Now we will define a model for Porcupine which, again, is the idealized version of this system.</p> <div class="highlight"><pre><span></span><span class="k">func</span><span class="w"> </span><span class="n">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">registerModel</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">porcupine</span><span class="o">.</span><span class="n">Model</span><span class="p">{</span> <span class="w"> </span><span class="n">Init</span><span class="p">:</span><span class="w"> </span><span class="k">func</span><span class="p">()</span><span class="w"> </span><span class="n">any</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="n">Step</span><span class="p">:</span><span class="w"> </span><span class="k">func</span><span class="p">(</span><span class="n">stateAny</span><span class="p">,</span><span class="w"> </span><span class="n">inputAny</span><span class="p">,</span><span class="w"> </span><span class="n">outputAny</span><span class="w"> </span><span class="n">any</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="nb nb-Type">bool</span><span class="p">,</span><span class="w"> </span><span class="n">any</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">input</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">inputAny</span><span class="o">.</span><span class="p">(</span><span class="n">registerInput</span><span class="p">)</span> <span class="w"> </span><span class="n">output</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">outputAny</span><span class="o">.</span><span class="p">(</span><span class="nb nb-Type">int</span><span class="p">)</span> <span class="w"> </span><span class="n">state</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">stateAny</span><span class="o">.</span><span class="p">(</span><span class="nb nb-Type">int</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">input</span><span class="o">.</span><span class="n">operation</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s2">&quot;set&quot;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="bp">true</span><span class="p">,</span><span class="w"> </span><span class="n">input</span><span class="o">.</span><span class="n">value</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">input</span><span class="o">.</span><span class="n">operation</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s2">&quot;get&quot;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">readCorrectValue</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">output</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">state</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">readCorrectValue</span><span class="p">,</span><span class="w"> </span><span class="n">state</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">panic</span><span class="p">(</span><span class="s2">&quot;Unexpected operation&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>The step function accepts anything because it has to be able to model any sort of system with its different inputs and outputs and current state. So we have to handle casting from the <code>any</code> type to what we know are the inputs and outputs and state. And finally we actually do the state change and return the new state as well as if the given output matches what we know it should be.</p> <h3 id="an-invalid-history">An invalid history</h3><p>Now we've only defined the idealized version of this system. Let's pretend we have some real-world implementation of this. We might have two clients and they might issue concurrent get and set requests.</p> <p>Every time we stimulate the system we will generate a new history that we can validate with Porcupine against our model to see if the history is linearizable.</p> <p>Let's imagine these two clients concurrently set the register to some value. Both sets succeed. Then both clients read the register. And they get different values. Here's what that history would look like modeled for Porcupine.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">ops</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="nx">porcupine</span><span class="p">.</span><span class="nx">Operation</span><span class="p">{</span> <span class="w"> </span><span class="c1">// Client 3 sets the register to 100. The request starts at t0 and ends at t2.</span> <span class="w"> </span><span class="p">{</span><span class="mi">3</span><span class="p">,</span><span class="w"> </span><span class="nx">registerInput</span><span class="p">{</span><span class="s">&quot;set&quot;</span><span class="p">,</span><span class="w"> </span><span class="mi">100</span><span class="p">},</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="mi">100</span><span class="w"> </span><span class="cm">/* end state at t2 is 100 */</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span><span class="p">},</span> <span class="w"> </span><span class="c1">// Client 5 sets the register to 200. The request starts at t3 and ends at t4.</span> <span class="w"> </span><span class="p">{</span><span class="mi">5</span><span class="p">,</span><span class="w"> </span><span class="nx">registerInput</span><span class="p">{</span><span class="s">&quot;set&quot;</span><span class="p">,</span><span class="w"> </span><span class="mi">200</span><span class="p">},</span><span class="w"> </span><span class="mi">3</span><span class="p">,</span><span class="w"> </span><span class="mi">200</span><span class="cm">/* end state at t3 is 200 */</span><span class="p">,</span><span class="w"> </span><span class="mi">4</span><span class="p">},</span> <span class="w"> </span><span class="c1">// Client 3 reads the register. The request starts at t5 and ends at t6.</span> <span class="w"> </span><span class="p">{</span><span class="mi">3</span><span class="p">,</span><span class="w"> </span><span class="nx">registerInput</span><span class="p">{</span><span class="s">&quot;get&quot;</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="cm">/* doesn&#39;t matter */</span><span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="mi">5</span><span class="p">,</span><span class="w"> </span><span class="mi">200</span><span class="p">,</span><span class="w"> </span><span class="mi">6</span><span class="p">},</span> <span class="w"> </span><span class="c1">// Client 5 reads the register. The request starts at t7 and ends at t8. Reads a stale value!</span> <span class="w"> </span><span class="p">{</span><span class="mi">5</span><span class="p">,</span><span class="w"> </span><span class="nx">registerInput</span><span class="p">{</span><span class="s">&quot;get&quot;</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="cm">/* doesn&#39;t matter */</span><span class="p">},</span><span class="w"> </span><span class="mi">7</span><span class="p">,</span><span class="w"> </span><span class="mi">100</span><span class="p">,</span><span class="w"> </span><span class="mi">8</span><span class="p">},</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="nx">info</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">porcupine</span><span class="p">.</span><span class="nx">CheckOperationsVerbose</span><span class="p">(</span><span class="nx">registerModel</span><span class="p">,</span><span class="w"> </span><span class="nx">ops</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span> <span class="w"> </span><span class="nx">visualizeTempFile</span><span class="p">(</span><span class="nx">registerModel</span><span class="p">,</span><span class="w"> </span><span class="nx">info</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">res</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">porcupine</span><span class="p">.</span><span class="nx">Ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="s">&quot;expected operations to be linearizable&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>If we build and run this code:</p> <div class="highlight"><pre><span></span><span class="err">$</span><span class="w"> </span><span class="k">go</span><span class="w"> </span><span class="k">mod</span><span class="w"> </span><span class="n">tidy</span> <span class="k">go</span><span class="err">:</span><span class="w"> </span><span class="n">finding</span><span class="w"> </span><span class="k">module</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">package</span><span class="w"> </span><span class="n">github</span><span class="p">.</span><span class="n">com</span><span class="o">/</span><span class="n">anishathalye</span><span class="o">/</span><span class="n">porcupine</span> <span class="k">go</span><span class="err">:</span><span class="w"> </span><span class="k">found</span><span class="w"> </span><span class="n">github</span><span class="p">.</span><span class="n">com</span><span class="o">/</span><span class="n">anishathalye</span><span class="o">/</span><span class="n">porcupine</span><span class="w"> </span><span class="ow">in</span><span class="w"> </span><span class="n">github</span><span class="p">.</span><span class="n">com</span><span class="o">/</span><span class="n">anishathalye</span><span class="o">/</span><span class="n">porcupine</span><span class="w"> </span><span class="n">v0</span><span class="mf">.1.6</span> <span class="err">$</span><span class="w"> </span><span class="k">go</span><span class="w"> </span><span class="n">build</span> <span class="err">$</span><span class="w"> </span><span class="p">.</span><span class="o">/</span><span class="n">lintest</span> <span class="mi">2024</span><span class="o">/</span><span class="mi">10</span><span class="o">/</span><span class="mi">31</span><span class="w"> </span><span class="mi">19</span><span class="err">:</span><span class="mi">54</span><span class="err">:</span><span class="mi">08</span><span class="w"> </span><span class="n">wrote</span><span class="w"> </span><span class="n">visualization</span><span class="w"> </span><span class="k">to</span><span class="w"> </span><span class="o">/</span><span class="nf">var</span><span class="o">/</span><span class="n">folders</span><span class="o">/</span><span class="n">cb</span><span class="o">/</span><span class="n">v27m749d0sj89h9ydfq0f0940000gn</span><span class="o">/</span><span class="n">T</span><span class="o">/</span><span class="mf">463308000.</span><span class="n">html</span> <span class="nl">panic</span><span class="p">:</span><span class="w"> </span><span class="n">expected</span><span class="w"> </span><span class="n">operations</span><span class="w"> </span><span class="k">to</span><span class="w"> </span><span class="n">be</span><span class="w"> </span><span class="n">linearizable</span> <span class="n">goroutine</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">[</span><span class="n">running</span><span class="o">]</span><span class="err">:</span> <span class="n">main</span><span class="p">.</span><span class="n">main</span><span class="p">()</span> <span class="w"> </span><span class="o">/</span><span class="n">Users</span><span class="o">/</span><span class="n">phil</span><span class="o">/</span><span class="n">tmp</span><span class="o">/</span><span class="n">lintest</span><span class="o">/</span><span class="n">main</span><span class="p">.</span><span class="k">go</span><span class="err">:</span><span class="mi">59</span><span class="w"> </span><span class="o">+</span><span class="mh">0x394</span> </pre></div> <p>Porcupine caught the stale value. Open that HTML file to see the visualization.</p> <p><img src="/assets/bad-register-history.png" alt="/assets/bad-register-history.png"></p> <h3 id="a-valid-history">A valid history</h3><p>Let's say we fix the bug so now there's no stale read. The new history would look like this:</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">ops</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="nx">porcupine</span><span class="p">.</span><span class="nx">Operation</span><span class="p">{</span> <span class="w"> </span><span class="c1">// Client 3 sets the register to 100. The request starts at t0 and ends at t2.</span> <span class="w"> </span><span class="p">{</span><span class="mi">3</span><span class="p">,</span><span class="w"> </span><span class="nx">registerInput</span><span class="p">{</span><span class="s">&quot;set&quot;</span><span class="p">,</span><span class="w"> </span><span class="mi">100</span><span class="p">},</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="mi">100</span><span class="w"> </span><span class="cm">/* end state at t2 is 100 */</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span><span class="p">},</span> <span class="w"> </span><span class="c1">// Client 5 sets the register to 200. The request starts at t3 and ends at t4.</span> <span class="w"> </span><span class="p">{</span><span class="mi">5</span><span class="p">,</span><span class="w"> </span><span class="nx">registerInput</span><span class="p">{</span><span class="s">&quot;set&quot;</span><span class="p">,</span><span class="w"> </span><span class="mi">200</span><span class="p">},</span><span class="w"> </span><span class="mi">3</span><span class="p">,</span><span class="w"> </span><span class="mi">200</span><span class="cm">/* end state at t3 is 200 */</span><span class="p">,</span><span class="w"> </span><span class="mi">4</span><span class="p">},</span> <span class="w"> </span><span class="c1">// Client 3 reads the register. The request starts at t5 and ends at t6.</span> <span class="w"> </span><span class="p">{</span><span class="mi">3</span><span class="p">,</span><span class="w"> </span><span class="nx">registerInput</span><span class="p">{</span><span class="s">&quot;get&quot;</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="cm">/* doesn&#39;t matter */</span><span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="mi">5</span><span class="p">,</span><span class="w"> </span><span class="mi">200</span><span class="p">,</span><span class="w"> </span><span class="mi">6</span><span class="p">},</span> <span class="w"> </span><span class="c1">// Client 5 reads the register. The request starts at t7 and ends at t8.</span> <span class="w"> </span><span class="p">{</span><span class="mi">5</span><span class="p">,</span><span class="w"> </span><span class="nx">registerInput</span><span class="p">{</span><span class="s">&quot;get&quot;</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="cm">/* doesn&#39;t matter */</span><span class="p">},</span><span class="w"> </span><span class="mi">7</span><span class="p">,</span><span class="w"> </span><span class="mi">200</span><span class="p">,</span><span class="w"> </span><span class="mi">8</span><span class="p">},</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Rebuild, rerun <code>lintest</code> (it should exit successfully now), and open the visualization.</p> <p><img src="/assets/good-register-history.png" alt="/assets/good-register-history.png"></p> <p>Great! Now let's make things a little more complicated by modeling a distributed key-value store rather than a distributed register.</p> <h3 id="distributed-key-value">Distributed key-value</h3><p>The inputs of this system will be slightly more complex. They will take a <code>key</code> along with the <code>operation</code> and <code>value</code>.</p> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">kvInput</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">operation</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="c1">// &quot;get&quot; and &quot;set&quot;</span> <span class="w"> </span><span class="nx">key</span><span class="w"> </span><span class="kt">string</span> <span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="kt">int</span> <span class="p">}</span> </pre></div> <p>And when we model the distributed key-value store with the state and output at each step being a <code>map[string]int</code>.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="n">kvModel</span><span class="w"> </span><span class="err">:</span><span class="o">=</span><span class="w"> </span><span class="n">porcupine</span><span class="p">.</span><span class="n">Model</span><span class="err">{</span> <span class="w"> </span><span class="nl">Init</span><span class="p">:</span><span class="w"> </span><span class="n">func</span><span class="p">()</span><span class="w"> </span><span class="ow">any</span><span class="w"> </span><span class="err">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">map</span><span class="o">[</span><span class="n">string</span><span class="o">]</span><span class="nc">int</span><span class="err">{}</span> <span class="w"> </span><span class="err">}</span><span class="p">,</span> <span class="w"> </span><span class="nl">Step</span><span class="p">:</span><span class="w"> </span><span class="n">func</span><span class="p">(</span><span class="n">stateAny</span><span class="p">,</span><span class="w"> </span><span class="n">inputAny</span><span class="p">,</span><span class="w"> </span><span class="n">outputAny</span><span class="w"> </span><span class="ow">any</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">bool</span><span class="p">,</span><span class="w"> </span><span class="ow">any</span><span class="p">)</span><span class="w"> </span><span class="err">{</span> <span class="w"> </span><span class="k">input</span><span class="w"> </span><span class="err">:</span><span class="o">=</span><span class="w"> </span><span class="n">inputAny</span><span class="p">.(</span><span class="n">kvInput</span><span class="p">)</span> <span class="w"> </span><span class="k">output</span><span class="w"> </span><span class="err">:</span><span class="o">=</span><span class="w"> </span><span class="n">outputAny</span><span class="p">.(</span><span class="k">map</span><span class="o">[</span><span class="n">string</span><span class="o">]</span><span class="nc">int</span><span class="p">)</span> <span class="w"> </span><span class="k">state</span><span class="w"> </span><span class="err">:</span><span class="o">=</span><span class="w"> </span><span class="n">stateAny</span><span class="p">.(</span><span class="k">map</span><span class="o">[</span><span class="n">string</span><span class="o">]</span><span class="nc">int</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="k">input</span><span class="p">.</span><span class="k">operation</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="ss">&quot;set&quot;</span><span class="w"> </span><span class="err">{</span> <span class="w"> </span><span class="n">newState</span><span class="w"> </span><span class="err">:</span><span class="o">=</span><span class="w"> </span><span class="k">map</span><span class="o">[</span><span class="n">string</span><span class="o">]</span><span class="nc">int</span><span class="err">{}</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">k</span><span class="p">,</span><span class="w"> </span><span class="n">v</span><span class="w"> </span><span class="err">:</span><span class="o">=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="k">state</span><span class="w"> </span><span class="err">{</span> <span class="w"> </span><span class="n">newState</span><span class="o">[</span><span class="n">k</span><span class="o">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">v</span> <span class="w"> </span><span class="err">}</span> <span class="w"> </span><span class="n">newState</span><span class="o">[</span><span class="n">input.key</span><span class="o">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">input</span><span class="p">.</span><span class="k">value</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">true</span><span class="p">,</span><span class="w"> </span><span class="n">newState</span> <span class="w"> </span><span class="err">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="k">input</span><span class="p">.</span><span class="k">operation</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="ss">&quot;get&quot;</span><span class="w"> </span><span class="err">{</span> <span class="w"> </span><span class="n">readCorrectValue</span><span class="w"> </span><span class="err">:</span><span class="o">=</span><span class="w"> </span><span class="k">output</span><span class="o">[</span><span class="n">input.key</span><span class="o">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="k">state</span><span class="o">[</span><span class="n">input.key</span><span class="o">]</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">readCorrectValue</span><span class="p">,</span><span class="w"> </span><span class="k">state</span> <span class="w"> </span><span class="err">}</span> <span class="w"> </span><span class="n">panic</span><span class="p">(</span><span class="ss">&quot;Unexpected operation&quot;</span><span class="p">)</span> <span class="w"> </span><span class="err">}</span><span class="p">,</span> <span class="w"> </span><span class="err">}</span> </pre></div> <p>And now the history gets slightly more complex because we are now working with some specific key. But we'll otherwise use the same history as before.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">ops</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="nx">porcupine</span><span class="p">.</span><span class="nx">Operation</span><span class="p">{</span> <span class="w"> </span><span class="c1">// Client 3 set key `a` to 100. The request starts at t0 and ends at t2.</span> <span class="w"> </span><span class="p">{</span><span class="mi">3</span><span class="p">,</span><span class="w"> </span><span class="nx">kvInput</span><span class="p">{</span><span class="s">&quot;set&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;a&quot;</span><span class="p">,</span><span class="w"> </span><span class="mi">100</span><span class="p">},</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">int</span><span class="p">{</span><span class="s">&quot;a&quot;</span><span class="p">:</span><span class="w"> </span><span class="mi">100</span><span class="p">},</span><span class="w"> </span><span class="mi">2</span><span class="p">},</span> <span class="w"> </span><span class="c1">// Client 5 set key `a` to 200. The request starts at t3 and ends at t4.</span> <span class="w"> </span><span class="p">{</span><span class="mi">5</span><span class="p">,</span><span class="w"> </span><span class="nx">kvInput</span><span class="p">{</span><span class="s">&quot;set&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;a&quot;</span><span class="p">,</span><span class="w"> </span><span class="mi">200</span><span class="p">},</span><span class="w"> </span><span class="mi">3</span><span class="p">,</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">int</span><span class="p">{</span><span class="s">&quot;a&quot;</span><span class="p">:</span><span class="w"> </span><span class="mi">200</span><span class="p">},</span><span class="w"> </span><span class="mi">4</span><span class="p">},</span> <span class="w"> </span><span class="c1">// Client 3 read key `a`. The request starts at t5 and ends at t6.</span> <span class="w"> </span><span class="p">{</span><span class="mi">3</span><span class="p">,</span><span class="w"> </span><span class="nx">kvInput</span><span class="p">{</span><span class="s">&quot;get&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;a&quot;</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="cm">/* doesn&#39;t matter */</span><span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="mi">5</span><span class="p">,</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">int</span><span class="p">{</span><span class="s">&quot;a&quot;</span><span class="p">:</span><span class="w"> </span><span class="mi">200</span><span class="p">},</span><span class="w"> </span><span class="mi">6</span><span class="p">},</span> <span class="w"> </span><span class="c1">// Client 5 read key `a`. The request starts at t7 and ends at t8.</span> <span class="w"> </span><span class="p">{</span><span class="mi">5</span><span class="p">,</span><span class="w"> </span><span class="nx">kvInput</span><span class="p">{</span><span class="s">&quot;get&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;a&quot;</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="cm">/* doesn&#39;t matter */</span><span class="p">},</span><span class="w"> </span><span class="mi">7</span><span class="p">,</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">int</span><span class="p">{</span><span class="s">&quot;a&quot;</span><span class="p">:</span><span class="w"> </span><span class="mi">200</span><span class="p">},</span><span class="w"> </span><span class="mi">8</span><span class="p">},</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Build and run. Open the visualization.</p> <p><img src="/assets/good-kv-history.png" alt="/assets/good-kv-history.png"></p> <p>And there we go!</p> <h3 id="what's-next">What's next</h3><p>These are just a few simple examples that are not hooked up to a real system. But it still seemed useful to show how you model one or two simple different systems and check a history with Porcupine.</p> <p>Another aspect of Porcupine I did not cover is partitioning the state space. The <a href="https://pkg.go.dev/github.com/anishathalye/porcupine#Model">docs</a> say:</p> <blockquote><p>Implementing the partition functions can greatly improve performance. If you're implementing the partition function, the model Init and Step functions can be per-partition. For example, if your specification is for a key-value store and you partition by key, then the per-partition state representation can just be a single value rather than a map.</p> </blockquote> <p>Perhaps that, and hooking this up to some "real" system, would be a good next step.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I wrote a short tutorial on using Porcupine to check for linearizability (without needing to deal with the JVM).<a href="https://t.co/kqeBz2jX76">https://t.co/kqeBz2jX76</a> <a href="https://t.co/teXvlp2zcv">pic.twitter.com/teXvlp2zcv</a></p>&mdash; Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1852143540131844109?ref_src=twsrc%5Etfw">November 1, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/2024-10-31-checking-linearizability-in-go.htmlThu, 31 Oct 2024 00:00:00 +0000Build a serverless ACID database with this one neat trick (atomic PutIfAbsent)http://notes.eatonphil.com/2024-09-29-build-a-serverless-acid-database-with-this-one-neat-trick.html<p>Delta Lake is an open protocol for serverless ACID databases. Due to its simplicity, scalability, and the number of open-source implementations, it's quickly becoming the DuckDB of serverless transactional databases for analytics workloads. Iceberg is a contender too, and is similar in many ways. But since Delta Lake is simpler (simple != better) that's where we'll focus in this post.</p> <p>Delta Lake has one of the most accessible database papers I've read (<a href="https://www.vldb.org/pvldb/vol13/p3411-armbrust.pdf">link</a>). It's kind of like the <a href="https://github.com/xoreaxeaxeax/movfuscator">movfuscator</a> of databases.</p> <p>Thanks to its simplicity, in this post we'll implement a Delta Lake-inspired serverless ACID database in 500 lines of Go code with zero dependencies. It will support creating tables, inserting rows into a table, and scanning all rows in a table. All while allowing concurrent readers and writers and achieving <a href="https://jepsen.io/consistency">snapshot isolation</a>.</p> <p>There are other critical parts of Delta Lake we'll ignore: updating rows, deleting rows, checkpointing the transaction metadata log, compaction, and probably much more I'm not aware of. We must start somewhere.</p> <p>All code for this post is <a href="https://github.com/eatonphil/otf">available on GitHub</a>.</p> <h3 id="delta-lake-basics">Delta Lake basics</h3><p>Delta Lake writes immutable data files to blob storage. It stores the names of new data files for a transaction in a metadata file. It handles concurrency (i.e. achieves snapshot isolation) with an atomic PutIfAbsent operation on the metadata file for the transaction.</p> <p>This method of concurrency control works because the metadata files follow a naming scheme that includes the transaction id in the file name. When a new transaction starts, it finds all existing metadata files and picks its own transaction id by adding 1 to the largest transaction id it sees.</p> <p>When a transaction goes to commit, writing the metadata file will fail if another transaction has already picked the same transaction id.</p> <p>If a transaction does no writes and creates no tables, the transaction does not attempt to write any metadata file. Snapshot isolation!</p> <p>Let's dig into the implementation.</p> <h3 id="boilerplate">Boilerplate</h3><p>Let's give ourselves some nice assertion methods, a debug method, and a uuid generator. In <code>main.go</code>:</p> <div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span> <span class="kn">import</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="s">&quot;encoding/json&quot;</span> <span class="w"> </span><span class="s">&quot;fmt&quot;</span> <span class="w"> </span><span class="s">&quot;io&quot;</span> <span class="w"> </span><span class="s">&quot;os&quot;</span> <span class="w"> </span><span class="s">&quot;path&quot;</span> <span class="w"> </span><span class="s">&quot;slices&quot;</span> <span class="w"> </span><span class="s">&quot;strings&quot;</span> <span class="p">)</span> <span class="kd">func</span><span class="w"> </span><span class="nx">assert</span><span class="p">(</span><span class="nx">b</span><span class="w"> </span><span class="kt">bool</span><span class="p">,</span><span class="w"> </span><span class="nx">msg</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">b</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">msg</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">assertEq</span><span class="p">[</span><span class="nx">C</span><span class="w"> </span><span class="kt">comparable</span><span class="p">](</span><span class="nx">a</span><span class="w"> </span><span class="nx">C</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="nx">C</span><span class="p">,</span><span class="w"> </span><span class="nx">prefix</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">a</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">&quot;%s &#39;%v&#39; != &#39;%v&#39;&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">prefix</span><span class="p">,</span><span class="w"> </span><span class="nx">a</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="p">))</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> <span class="kd">var</span><span class="w"> </span><span class="nx">DEBUG</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">slices</span><span class="p">.</span><span class="nx">Contains</span><span class="p">(</span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;--debug&quot;</span><span class="p">)</span> <span class="kd">func</span><span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="nx">a</span><span class="w"> </span><span class="o">...</span><span class="kt">any</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">DEBUG</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">args</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">append</span><span class="p">([]</span><span class="kt">any</span><span class="p">{</span><span class="s">&quot;[DEBUG]&quot;</span><span class="p">},</span><span class="w"> </span><span class="nx">a</span><span class="o">...</span><span class="p">)</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="nx">args</span><span class="o">...</span><span class="p">)</span> <span class="p">}</span> <span class="c1">// https://datatracker.ietf.org/doc/html/rfc4122#section-4.4</span> <span class="kd">func</span><span class="w"> </span><span class="nx">uuidv4</span><span class="p">()</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">f</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Open</span><span class="p">(</span><span class="s">&quot;/dev/random&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">assert</span><span class="p">(</span><span class="nx">err</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">&quot;could not open /dev/random: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">))</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">f</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span> <span class="w"> </span><span class="nx">buf</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">make</span><span class="p">([]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="mi">16</span><span class="p">)</span> <span class="w"> </span><span class="nx">n</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">f</span><span class="p">.</span><span class="nx">Read</span><span class="p">(</span><span class="nx">buf</span><span class="p">)</span> <span class="w"> </span><span class="nx">assert</span><span class="p">(</span><span class="nx">err</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">&quot;could not read 16 bytes from /dev/random: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">))</span> <span class="w"> </span><span class="nx">assert</span><span class="p">(</span><span class="nx">n</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">buf</span><span class="p">),</span><span class="w"> </span><span class="s">&quot;expected 16 bytes from /dev/random&quot;</span><span class="p">)</span> <span class="w"> </span><span class="c1">// Set bit 6 to 0</span> <span class="w"> </span><span class="nx">buf</span><span class="p">[</span><span class="mi">8</span><span class="p">]</span><span class="w"> </span><span class="o">&amp;=</span><span class="w"> </span><span class="p">^(</span><span class="nb">byte</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="o">&lt;&lt;</span><span class="w"> </span><span class="mi">6</span><span class="p">)</span> <span class="w"> </span><span class="c1">// Set bit 7 to 1</span> <span class="w"> </span><span class="nx">buf</span><span class="p">[</span><span class="mi">8</span><span class="p">]</span><span class="w"> </span><span class="o">|=</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">&lt;&lt;</span><span class="w"> </span><span class="mi">7</span> <span class="w"> </span><span class="c1">// Set version</span> <span class="w"> </span><span class="nx">buf</span><span class="p">[</span><span class="mi">6</span><span class="p">]</span><span class="w"> </span><span class="o">&amp;=</span><span class="w"> </span><span class="p">^(</span><span class="nb">byte</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="o">&lt;&lt;</span><span class="w"> </span><span class="mi">4</span><span class="p">)</span> <span class="w"> </span><span class="nx">buf</span><span class="p">[</span><span class="mi">6</span><span class="p">]</span><span class="w"> </span><span class="o">&amp;=</span><span class="w"> </span><span class="p">^(</span><span class="nb">byte</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="o">&lt;&lt;</span><span class="w"> </span><span class="mi">5</span><span class="p">)</span> <span class="w"> </span><span class="nx">buf</span><span class="p">[</span><span class="mi">6</span><span class="p">]</span><span class="w"> </span><span class="o">|=</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">&lt;&lt;</span><span class="w"> </span><span class="mi">6</span> <span class="w"> </span><span class="nx">buf</span><span class="p">[</span><span class="mi">6</span><span class="p">]</span><span class="w"> </span><span class="o">&amp;=</span><span class="w"> </span><span class="p">^(</span><span class="nb">byte</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="o">&lt;&lt;</span><span class="w"> </span><span class="mi">7</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">&quot;%x-%x-%x-%x-%x&quot;</span><span class="p">,</span> <span class="w"> </span><span class="nx">buf</span><span class="p">[:</span><span class="mi">4</span><span class="p">],</span> <span class="w"> </span><span class="nx">buf</span><span class="p">[</span><span class="mi">4</span><span class="p">:</span><span class="mi">6</span><span class="p">],</span> <span class="w"> </span><span class="nx">buf</span><span class="p">[</span><span class="mi">6</span><span class="p">:</span><span class="mi">8</span><span class="p">],</span> <span class="w"> </span><span class="nx">buf</span><span class="p">[</span><span class="mi">8</span><span class="p">:</span><span class="mi">10</span><span class="p">],</span> <span class="w"> </span><span class="nx">buf</span><span class="p">[</span><span class="mi">10</span><span class="p">:</span><span class="mi">16</span><span class="p">])</span> <span class="p">}</span> </pre></div> <p>Is that uuid method correct? Hopefully. Efficient? No. But it's preferable to avoid dependencies in pedagogical projects.</p> <p>Moving on.</p> <h3 id="blob-storage-requirements">Blob storage requirements</h3><p>As mentioned above, the basic requirement is that we support atomically writing some bytes to a location if the location doesn't already exist.</p> <p>On top of that we also need the ability to list locations by prefix, and the ability to read the bytes at some location.</p> <p class="note"> We'll diverge from Delta Lake in how we name files on disk. For one, we'll keep all files in the same directory with a fixed prefix for metadata and another table name prefix for each data file. This simplifies the implementation of <code>listPrefix</code> a bit. <br /> <br /> However, this also diverges from Delta Lake in that transactions will represent all tables. In Delta Lake that is not so. Delta Lake has a per-table transaction log. Only transactions that read and write the same table in Delta Lake achieve snapshot isolation. </p><p>So let's set up an interface to describe these requirements:</p> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">objectStorage</span><span class="w"> </span><span class="kd">interface</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Must be atomic.</span> <span class="w"> </span><span class="nx">putIfAbsent</span><span class="p">(</span><span class="nx">name</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">bytes</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span> <span class="w"> </span><span class="nx">listPrefix</span><span class="p">(</span><span class="nx">prefix</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">([]</span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span> <span class="w"> </span><span class="nx">read</span><span class="p">(</span><span class="nx">name</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">([]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span> <span class="p">}</span> </pre></div> <p>And this is literally all we need to get ACID transactions. That's crazy!</p> <h4 id="atomic-put-and-cloud-blob-storage">Atomic Put and cloud blob storage</h4><p>We could implement the atomic <code>putIfAbsent</code> part of this interface in 2024 using <a href="https://aws.amazon.com/about-aws/whats-new/2024/08/amazon-s3-conditional-writes/">conditional writes</a> on S3. Or we could implement this interface with the <code>If-None-Match</code> <a href="https://learn.microsoft.com/en-us/rest/api/storageservices/specifying-conditional-headers-for-blob-service-operations">header</a> on Azure Cloud Storage. Or we could implement this interface with the <code>x-goog-if-generation-match</code> <a href="https://cloud.google.com/storage/docs/xml-api/put-object">header</a> on Google Cloud Storage.</p> <p>Indeed a good exercise for the reader would be to implement this interface for other blob storage providers and see your serverless cloud database in action!</p> <p>But the simplest method of all is to implement it on the filesystem, which is what we'll do next.</p> <h3 id="a-filesystem-blob-store">A filesystem blob store</h3><p>If we had a server we could implement atomic <code>putIfAbsent</code> with a mutex. But we're serverless baby. Thankfully, POSIX <a href="https://rcrowley.org/2010/01/06/things-unix-can-do-atomically.html">supports atomic link</a> which will fail if the new name is already a file.</p> <p>So we'll just create a temporary file and write out all bytes. Finally, we link the temporary file to the permanent name we intended. For cleanliness (not correctness), if there is an error at any point, we'll remove the temporary file.</p> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">fileObjectStorage</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">basedir</span><span class="w"> </span><span class="kt">string</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">newFileObjectStorage</span><span class="p">(</span><span class="nx">basedir</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="nx">fileObjectStorage</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">fileObjectStorage</span><span class="p">{</span><span class="nx">basedir</span><span class="p">}</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">fos</span><span class="w"> </span><span class="o">*</span><span class="nx">fileObjectStorage</span><span class="p">)</span><span class="w"> </span><span class="nx">putIfAbsent</span><span class="p">(</span><span class="nx">name</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">bytes</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">tmpfilename</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">path</span><span class="p">.</span><span class="nx">Join</span><span class="p">(</span><span class="nx">fos</span><span class="p">.</span><span class="nx">basedir</span><span class="p">,</span><span class="w"> </span><span class="nx">uuidv4</span><span class="p">())</span> <span class="w"> </span><span class="nx">f</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">OpenFile</span><span class="p">(</span><span class="nx">tmpfilename</span><span class="p">,</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">O_WRONLY</span><span class="o">|</span><span class="nx">os</span><span class="p">.</span><span class="nx">O_CREATE</span><span class="p">,</span><span class="w"> </span><span class="mo">0644</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">written</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span> <span class="w"> </span><span class="nx">bufSize</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">1024</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">16</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">written</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">bytes</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">toWrite</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">min</span><span class="p">(</span><span class="nx">written</span><span class="o">+</span><span class="nx">bufSize</span><span class="p">,</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">bytes</span><span class="p">))</span> <span class="w"> </span><span class="nx">n</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">f</span><span class="p">.</span><span class="nx">Write</span><span class="p">(</span><span class="nx">bytes</span><span class="p">[</span><span class="nx">written</span><span class="p">:</span><span class="nx">toWrite</span><span class="p">])</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">removeErr</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Remove</span><span class="p">(</span><span class="nx">tmpfilename</span><span class="p">)</span> <span class="w"> </span><span class="nx">assert</span><span class="p">(</span><span class="nx">removeErr</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;could not remove&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">written</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="nx">n</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">f</span><span class="p">.</span><span class="nx">Sync</span><span class="p">()</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">removeErr</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Remove</span><span class="p">(</span><span class="nx">tmpfilename</span><span class="p">)</span> <span class="w"> </span><span class="nx">assert</span><span class="p">(</span><span class="nx">removeErr</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;could not remove&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">f</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">removeErr</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Remove</span><span class="p">(</span><span class="nx">tmpfilename</span><span class="p">)</span> <span class="w"> </span><span class="nx">assert</span><span class="p">(</span><span class="nx">removeErr</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;could not remove&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">filename</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">path</span><span class="p">.</span><span class="nx">Join</span><span class="p">(</span><span class="nx">fos</span><span class="p">.</span><span class="nx">basedir</span><span class="p">,</span><span class="w"> </span><span class="nx">name</span><span class="p">)</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Link</span><span class="p">(</span><span class="nx">tmpfilename</span><span class="p">,</span><span class="w"> </span><span class="nx">filename</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">removeErr</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Remove</span><span class="p">(</span><span class="nx">tmpfilename</span><span class="p">)</span> <span class="w"> </span><span class="nx">assert</span><span class="p">(</span><span class="nx">removeErr</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;could not remove&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <p class="note"> <a href="https://news.ycombinator.com/item?id=41702593">yencabulator</a> on HN pointed out that an earlier version of this post had a buggy implementation of <code>putIfAbsent</code> (that attempted to manage atomicity solely via <code>O_EXCL | O_CREAT</code>) would leave around potentially bad metadata files if the <code>os.Remove</code> call ever failed. <br /> <br /> The <code>link</code> approach works around that because the file is already fully and correctly written by the time we do the link. </p><p><code>listPrefix</code> and <code>read</code> are minimal wrappers around filesystem APIs:</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">fos</span><span class="w"> </span><span class="o">*</span><span class="nx">fileObjectStorage</span><span class="p">)</span><span class="w"> </span><span class="nx">listPrefix</span><span class="p">(</span><span class="nx">prefix</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">([]</span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">dir</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">path</span><span class="p">.</span><span class="nx">Join</span><span class="p">(</span><span class="nx">fos</span><span class="p">.</span><span class="nx">basedir</span><span class="p">)</span> <span class="w"> </span><span class="nx">f</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Open</span><span class="p">(</span><span class="nx">dir</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">files</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">io</span><span class="p">.</span><span class="nx">EOF</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">names</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span> <span class="w"> </span><span class="nx">names</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">f</span><span class="p">.</span><span class="nx">Readdirnames</span><span class="p">(</span><span class="mi">100</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">io</span><span class="p">.</span><span class="nx">EOF</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">n</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">names</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">prefix</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">HasPrefix</span><span class="p">(</span><span class="nx">n</span><span class="p">,</span><span class="w"> </span><span class="nx">prefix</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">files</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">files</span><span class="p">,</span><span class="w"> </span><span class="nx">n</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">f</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">files</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">fos</span><span class="w"> </span><span class="o">*</span><span class="nx">fileObjectStorage</span><span class="p">)</span><span class="w"> </span><span class="nx">read</span><span class="p">(</span><span class="nx">name</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">([]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">filename</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">path</span><span class="p">.</span><span class="nx">Join</span><span class="p">(</span><span class="nx">fos</span><span class="p">.</span><span class="nx">basedir</span><span class="p">,</span><span class="w"> </span><span class="nx">name</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">ReadFile</span><span class="p">(</span><span class="nx">filename</span><span class="p">)</span> <span class="p">}</span> </pre></div> <p>It is worth talking a bit about reading a directory though. Go doesn't provide a nice iterator API for us and I didn't want to implement this as callbacks with <a href="https://pkg.go.dev/path/filepath#WalkDir"><code>path/filepath.WalkDir</code></a>.</p> <p>We could use <a href="https://pkg.go.dev/os#File.ReadDir"><code>os.File.ReadDir</code></a> but it allocates for all files in the directory. Sure, in a pedagogical project we don't worry about millions of files. But the <code>ReadDir</code> API, the error cases in particular, also isn't much simpler than <a href="https://pkg.go.dev/os#File.Readdirnames"><code>Readdirnames</code></a>.</p> <p class="note"> What's more, even though we iterated through batches of directory entries, and did prefix filtering before accumulating, we still could have considered returning an iterator here ourselves. It seems possible and likely that the number of data files grows quite large in a production system. But I was lazy. </p><p>It would be nice if Go introduced an actual iterator API for reading a directory. :)</p> <h4 id="delta-lake-and-stale-reads">Delta Lake and stale reads</h4><p>In any case the ACID properties of Delta Lake (and Iceberg) don't depend on being able to read up-to-date data.</p> <p>This is because concurrent (or stale) transactions that <em>write</em> will <em>fail on commit</em>. And also because all files written (even metadata files) are immutable.</p> <p>Since all data is immutable, we will always be able to read at least a consistent snapshot of data. But we will never be able to get SERIALIZABLE <strong>read-only</strong> transactions. This is just how Delta Lake and Iceberg work. And it is a <a href="https://jepsen.io/consistency">similar</a> or better consistency level to what any major SQL database <a href="https://github.com/ept/hermitage">gives you by default</a>.</p> <p>You'll see what I mean later on when we implement transaction commits.</p> <h3 id="transaction-boilerplate">Transaction boilerplate</h3><p>Now that we've got a blob storage abstraction and a filesystem implementation of it, let's start sketching out what a client and what a transaction looks like.</p> <p>In Delta Lake, a transaction consists of a list of actions. An action might be to define a table's schema, or to add a data file, or to remove a data file, etc. In this post we'll only implement the first two actions.</p> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">DataobjectAction</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">Name</span><span class="w"> </span><span class="kt">string</span> <span class="w"> </span><span class="nx">Table</span><span class="w"> </span><span class="kt">string</span> <span class="p">}</span> <span class="kd">type</span><span class="w"> </span><span class="nx">ChangeMetadataAction</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">Table</span><span class="w"> </span><span class="kt">string</span> <span class="w"> </span><span class="nx">Columns</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span> <span class="p">}</span> <span class="c1">// an enum, only one field will be non-nil</span> <span class="kd">type</span><span class="w"> </span><span class="nx">Action</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">AddDataobject</span><span class="w"> </span><span class="o">*</span><span class="nx">DataobjectAction</span> <span class="w"> </span><span class="nx">ChangeMetadata</span><span class="w"> </span><span class="o">*</span><span class="nx">ChangeMetadataAction</span> <span class="w"> </span><span class="c1">// TODO: Support object removal.</span> <span class="w"> </span><span class="c1">// DeleteDataobject *DataobjectAction</span> <span class="p">}</span> </pre></div> <p>These fields are all exported (i.e. capitalized, if you're not familiar with Go) because we will be writing them to disk when the transaction commits as the transaction's metadata.</p> <p>In fact <code>Action</code>s and the transaction's id will be the only parts of the transaction we write to disk. Everything else will be in-memory state.</p> <p>For our convenience we will track in memory a history of all previous actions, a mapping of table columns, and a mapping of unflushed data by table.</p> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">transaction</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">Id</span><span class="w"> </span><span class="kt">int</span> <span class="w"> </span><span class="c1">// Both are mapping table name to a list of actions on the table.</span> <span class="w"> </span><span class="nx">previousActions</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">][]</span><span class="nx">Action</span> <span class="w"> </span><span class="nx">Actions</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">][]</span><span class="nx">Action</span> <span class="w"> </span><span class="c1">// Mapping tables to column names.</span> <span class="w"> </span><span class="nx">tables</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">][]</span><span class="kt">string</span> <span class="w"> </span><span class="c1">// Mapping table name to unflushed/in-memory rows. When rows</span> <span class="w"> </span><span class="c1">// are flushed, the dataobject that contains them is added to</span> <span class="w"> </span><span class="c1">// `tx.actions` above and `tx.unflushedDataPointer[table]` is</span> <span class="w"> </span><span class="c1">// reset to `0`.</span> <span class="w"> </span><span class="nx">unflushedData</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="o">*</span><span class="p">[</span><span class="nx">DATAOBJECT_SIZE</span><span class="p">][]</span><span class="kt">any</span> <span class="w"> </span><span class="nx">unflushedDataPointer</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">int</span> <span class="p">}</span> </pre></div> <p>Only the current <code>transaction</code> will ever have <code>transaction.previousActions</code> filled out. <code>transaction.tables</code> will be populated when the transaction starts by reading through <code>transaction.previousActions</code> for <code>ChangeMetadataAction</code>s, and we will also add onto it when we create a table in the current transaction.</p> <p>We will append to <code>transaction.Actions</code> every time we write a new data file and every time we create a new table.</p> <p>We will add rows to <code>transaction.unflushedData</code> for a table until <code>transaction.unflushedDataPointer</code> for that table reaches <code>DATAOBJECT_SIZE</code> upon which time we will write that data to disk and add a <code>DataobjectAction</code> entry to <code>transaction.Actions</code>.</p> <h3 id="client-boilerplate">Client boilerplate</h3><p>A <code>client</code> will consist of an <code>objectStorage</code> implementation and a possibly empty <code>*transaction</code>. Empty meaning there is no current transaction.</p> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">client</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">os</span><span class="w"> </span><span class="nx">objectStorage</span> <span class="w"> </span><span class="c1">// Current transaction, if any. Only one transaction per</span> <span class="w"> </span><span class="c1">// client at a time. All reads and writes must be within a</span> <span class="w"> </span><span class="c1">// transaction.</span> <span class="w"> </span><span class="nx">tx</span><span class="w"> </span><span class="o">*</span><span class="nx">transaction</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">newClient</span><span class="p">(</span><span class="nx">os</span><span class="w"> </span><span class="nx">objectStorage</span><span class="p">)</span><span class="w"> </span><span class="nx">client</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">client</span><span class="p">{</span><span class="nx">os</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">}</span> <span class="p">}</span> <span class="kd">var</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="nx">errExistingTx</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Existing Transaction&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">errNoTx</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;No Transaction&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">errTableExists</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Table Exists&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">errNoTable</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;No Such Table&quot;</span><span class="p">)</span> <span class="p">)</span> </pre></div> <h4 id="client-or-database?">Client or database?</h4><p>In a previous version of my code I named this <code>client</code> struct <code>database</code>. But that's misleading. There is no central database. There is just the client and the blob storage.</p> <p>Clients work with transactions directly and only when attempting to commit does the blob storage abstraction let the client know if the transaction succeeded or not.</p> <h3 id="starting-a-transaction">Starting a transaction</h3><p>When we start a transaction, we will first read all existing transactions from disk and accumulate the actions from each prior transaction.</p> <p>We will interpret <code>ChangeMetadataAction</code>s and materialize them into a current view of all tables.</p> <p>And we will assign a transaction ID to this transaction to be 1 greater than the largest existing transaction ID we see.</p> <p>Again it doesn't matter if the <code>listPrefix</code> call we use returns an up-to-date list. Notably on blob storage there are few guarantees about LIST operations recency. The Delta Lake paper mentions this too.</p> <p>Out-of-date transactions attempting to write will be caught when we go to commit the transaction. Out-of-date transactions attempting only to read will still read a consistent snapshot.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">d</span><span class="w"> </span><span class="o">*</span><span class="nx">client</span><span class="p">)</span><span class="w"> </span><span class="nx">newTx</span><span class="p">()</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">errExistingTx</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">logPrefix</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="s">&quot;_log_&quot;</span> <span class="w"> </span><span class="nx">txLogFilenames</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">os</span><span class="p">.</span><span class="nx">listPrefix</span><span class="p">(</span><span class="nx">logPrefix</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">tx</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">transaction</span><span class="p">{}</span> <span class="w"> </span><span class="nx">tx</span><span class="p">.</span><span class="nx">previousActions</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">][]</span><span class="nx">Action</span><span class="p">{}</span> <span class="w"> </span><span class="nx">tx</span><span class="p">.</span><span class="nx">Actions</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">][]</span><span class="nx">Action</span><span class="p">{}</span> <span class="w"> </span><span class="nx">tx</span><span class="p">.</span><span class="nx">tables</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">][]</span><span class="kt">string</span><span class="p">{}</span> <span class="w"> </span><span class="nx">tx</span><span class="p">.</span><span class="nx">unflushedData</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="o">*</span><span class="p">[</span><span class="nx">DATAOBJECT_SIZE</span><span class="p">][]</span><span class="kt">any</span><span class="p">{}</span> <span class="w"> </span><span class="nx">tx</span><span class="p">.</span><span class="nx">unflushedDataPointer</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">int</span><span class="p">{}</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">txLogFilename</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">txLogFilenames</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">bytes</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">os</span><span class="p">.</span><span class="nx">read</span><span class="p">(</span><span class="nx">txLogFilename</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">oldTx</span><span class="w"> </span><span class="nx">transaction</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">Unmarshal</span><span class="p">(</span><span class="nx">bytes</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">oldTx</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Transaction metadata files are sorted</span> <span class="w"> </span><span class="c1">// lexicographically so that the most recent</span> <span class="w"> </span><span class="c1">// transaction (i.e. the one with the largest</span> <span class="w"> </span><span class="c1">// transaction id) will be last and tx.Id will end up</span> <span class="w"> </span><span class="c1">// 1 greater than the most recent transaction ID we</span> <span class="w"> </span><span class="c1">// see on disk.</span> <span class="w"> </span><span class="nx">tx</span><span class="p">.</span><span class="nx">Id</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">oldTx</span><span class="p">.</span><span class="nx">Id</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">table</span><span class="p">,</span><span class="w"> </span><span class="nx">actions</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">oldTx</span><span class="p">.</span><span class="nx">Actions</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">action</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">actions</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">action</span><span class="p">.</span><span class="nx">AddDataobject</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">tx</span><span class="p">.</span><span class="nx">previousActions</span><span class="p">[</span><span class="nx">table</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">tx</span><span class="p">.</span><span class="nx">previousActions</span><span class="p">[</span><span class="nx">table</span><span class="p">],</span><span class="w"> </span><span class="nx">action</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">action</span><span class="p">.</span><span class="nx">ChangeMetadata</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Store the latest version of</span> <span class="w"> </span><span class="c1">// each table in memory for</span> <span class="w"> </span><span class="c1">// easy lookup.</span> <span class="w"> </span><span class="nx">mtd</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">action</span><span class="p">.</span><span class="nx">ChangeMetadata</span> <span class="w"> </span><span class="nx">tx</span><span class="p">.</span><span class="nx">tables</span><span class="p">[</span><span class="nx">table</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">mtd</span><span class="p">.</span><span class="nx">Columns</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">&quot;unsupported action: %v&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">action</span><span class="p">))</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">tx</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <p>And we're set.</p> <h3 id="creating-a-table">Creating a table</h3><p>When we create a table, we need to add a <code>ChangeMetadataAction</code> to the transactions <code>Actions</code>. And we also want to add the table info to the in-memory <code>transaction.tables</code> field.</p> <p>We don't do any of this durably. The change here will be written to disk on commit (if the transaction succeeds).</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">d</span><span class="w"> </span><span class="o">*</span><span class="nx">client</span><span class="p">)</span><span class="w"> </span><span class="nx">createTable</span><span class="p">(</span><span class="nx">table</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">columns</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">errNoTx</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">exists</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">.</span><span class="nx">tables</span><span class="p">[</span><span class="nx">table</span><span class="p">];</span><span class="w"> </span><span class="nx">exists</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">errTableExists</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Store it in the in-memory mapping.</span> <span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">.</span><span class="nx">tables</span><span class="p">[</span><span class="nx">table</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">columns</span> <span class="w"> </span><span class="c1">// And also add it to the action history for future transactions.</span> <span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">.</span><span class="nx">Actions</span><span class="p">[</span><span class="nx">table</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">.</span><span class="nx">Actions</span><span class="p">[</span><span class="nx">table</span><span class="p">],</span><span class="w"> </span><span class="nx">Action</span><span class="p">{</span> <span class="w"> </span><span class="nx">ChangeMetadata</span><span class="p">:</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">ChangeMetadataAction</span><span class="p">{</span> <span class="w"> </span><span class="nx">Table</span><span class="p">:</span><span class="w"> </span><span class="nx">table</span><span class="p">,</span> <span class="w"> </span><span class="nx">Columns</span><span class="p">:</span><span class="w"> </span><span class="nx">columns</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">})</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <p>Easy peasy. Now for the fun part, writing data!</p> <h3 id="writing-a-row">Writing a row</h3><p>This is the next area where we'll diverge from Delta Lake. For the sake of zero dependencies we are going to store data in-memory as an array of array of <code>any</code>. And when we later write rows to disk we'll write them as JSON. A real Delta Lake implementation would store data in-memory in Apache Arrow format, and write to disk as Parquet.</p> <p>In line with Delta Lake though we will buffer data in memory until we get 64K rows. When we get 64K rows for a particular table we will flush all those rows to disk. (When we go to commit a transaction we will flush any outstanding rows.)</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">d</span><span class="w"> </span><span class="o">*</span><span class="nx">client</span><span class="p">)</span><span class="w"> </span><span class="nx">writeRow</span><span class="p">(</span><span class="nx">table</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">row</span><span class="w"> </span><span class="p">[]</span><span class="kt">any</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">errNoTx</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">.</span><span class="nx">tables</span><span class="p">[</span><span class="nx">table</span><span class="p">];</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">errNoTable</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Try to find an unflushed/in-memory dataobject for this table</span> <span class="w"> </span><span class="nx">pointer</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">.</span><span class="nx">unflushedDataPointer</span><span class="p">[</span><span class="nx">table</span><span class="p">]</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">.</span><span class="nx">unflushedDataPointer</span><span class="p">[</span><span class="nx">table</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">0</span> <span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">.</span><span class="nx">unflushedData</span><span class="p">[</span><span class="nx">table</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="o">&amp;</span><span class="p">[</span><span class="nx">DATAOBJECT_SIZE</span><span class="p">][]</span><span class="kt">any</span><span class="p">{}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">pointer</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">DATAOBJECT_SIZE</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">flushRows</span><span class="p">(</span><span class="nx">table</span><span class="p">)</span> <span class="w"> </span><span class="nx">pointer</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">0</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">.</span><span class="nx">unflushedData</span><span class="p">[</span><span class="nx">table</span><span class="p">][</span><span class="nx">pointer</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">row</span> <span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">.</span><span class="nx">unflushedDataPointer</span><span class="p">[</span><span class="nx">table</span><span class="p">]</span><span class="o">++</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <p>Now let's implement flushing.</p> <h3 id="flushing-a-data-object">Flushing a data object</h3><p>Recall that data objects in Delta Lake (and Iceberg) are immutable. Once we've got enough data to write a data object, we give it a unique name, write it to disk, and add a <code>AddObjectAction</code> to the transaction's list of <code>Actions</code>.</p> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">dataobject</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">Table</span><span class="w"> </span><span class="kt">string</span> <span class="w"> </span><span class="nx">Name</span><span class="w"> </span><span class="kt">string</span> <span class="w"> </span><span class="nx">Data</span><span class="w"> </span><span class="p">[</span><span class="nx">DATAOBJECT_SIZE</span><span class="p">][]</span><span class="kt">any</span> <span class="w"> </span><span class="nx">Len</span><span class="w"> </span><span class="kt">int</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">d</span><span class="w"> </span><span class="o">*</span><span class="nx">client</span><span class="p">)</span><span class="w"> </span><span class="nx">flushRows</span><span class="p">(</span><span class="nx">table</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">errNoTx</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// First write out dataobject if there is anything to write out.</span> <span class="w"> </span><span class="nx">pointer</span><span class="p">,</span><span class="w"> </span><span class="nx">exists</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">.</span><span class="nx">unflushedDataPointer</span><span class="p">[</span><span class="nx">table</span><span class="p">]</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">exists</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">pointer</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">df</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">dataobject</span><span class="p">{</span> <span class="w"> </span><span class="nx">Table</span><span class="p">:</span><span class="w"> </span><span class="nx">table</span><span class="p">,</span> <span class="w"> </span><span class="nx">Name</span><span class="p">:</span><span class="w"> </span><span class="nx">uuidv4</span><span class="p">(),</span> <span class="w"> </span><span class="nx">Data</span><span class="p">:</span><span class="w"> </span><span class="o">*</span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">.</span><span class="nx">unflushedData</span><span class="p">[</span><span class="nx">table</span><span class="p">],</span> <span class="w"> </span><span class="nx">Len</span><span class="p">:</span><span class="w"> </span><span class="nx">pointer</span><span class="p">,</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">bytes</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">Marshal</span><span class="p">(</span><span class="nx">df</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">os</span><span class="p">.</span><span class="nx">putIfAbsent</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">&quot;_table_%s_%s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">table</span><span class="p">,</span><span class="w"> </span><span class="nx">df</span><span class="p">.</span><span class="nx">Name</span><span class="p">),</span><span class="w"> </span><span class="nx">bytes</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Then record the newly written data file.</span> <span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">.</span><span class="nx">Actions</span><span class="p">[</span><span class="nx">table</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">.</span><span class="nx">Actions</span><span class="p">[</span><span class="nx">table</span><span class="p">],</span><span class="w"> </span><span class="nx">Action</span><span class="p">{</span> <span class="w"> </span><span class="nx">AddDataobject</span><span class="p">:</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">DataobjectAction</span><span class="p">{</span> <span class="w"> </span><span class="nx">Table</span><span class="p">:</span><span class="w"> </span><span class="nx">table</span><span class="p">,</span> <span class="w"> </span><span class="nx">Name</span><span class="p">:</span><span class="w"> </span><span class="nx">df</span><span class="p">.</span><span class="nx">Name</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">})</span> <span class="w"> </span><span class="c1">// Reset in-memory pointer.</span> <span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">.</span><span class="nx">unflushedDataPointer</span><span class="p">[</span><span class="nx">table</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">0</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <p>That's it for writing data! Let's now look at reading data.</p> <h3 id="scanning-a-table">Scanning a table</h3><p>We're going to make scanning mildly more complicated than it needed to be in pedagogical code because we'll have <code>client.scan()</code> return an iterator rather than an array with all rows.</p> <p>The <code>scanIterator</code> will first read from in-memory (unflushed) data. And then it will read through every data object for the table that is still a part of this transaction. We will know which data objects are still a part of this transaction by reading through all <code>AddDataobject</code> actions. A future version of this project would also eliminate data object files from the list by observing <code>DeleteDataobject</code> actions. But we don't do that in this post.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">d</span><span class="w"> </span><span class="o">*</span><span class="nx">client</span><span class="p">)</span><span class="w"> </span><span class="nx">scan</span><span class="p">(</span><span class="nx">table</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">scanIterator</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">errNoTx</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">dataobjects</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span> <span class="w"> </span><span class="nx">allActions</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">.</span><span class="nx">previousActions</span><span class="p">[</span><span class="nx">table</span><span class="p">],</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">.</span><span class="nx">Actions</span><span class="p">[</span><span class="nx">table</span><span class="p">]</span><span class="o">...</span><span class="p">)</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">action</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">allActions</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">action</span><span class="p">.</span><span class="nx">AddDataobject</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">dataobjects</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">dataobjects</span><span class="p">,</span><span class="w"> </span><span class="nx">action</span><span class="p">.</span><span class="nx">AddDataobject</span><span class="p">.</span><span class="nx">Name</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">unflushedRows</span><span class="w"> </span><span class="p">[</span><span class="nx">DATAOBJECT_SIZE</span><span class="p">][]</span><span class="kt">any</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">data</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">.</span><span class="nx">unflushedData</span><span class="p">[</span><span class="nx">table</span><span class="p">];</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">unflushedRows</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="o">*</span><span class="nx">data</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">scanIterator</span><span class="p">{</span> <span class="w"> </span><span class="nx">unflushedRows</span><span class="p">:</span><span class="w"> </span><span class="nx">unflushedRows</span><span class="p">,</span> <span class="w"> </span><span class="nx">unflushedRowsLen</span><span class="p">:</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">.</span><span class="nx">unflushedDataPointer</span><span class="p">[</span><span class="nx">table</span><span class="p">],</span> <span class="w"> </span><span class="nx">d</span><span class="p">:</span><span class="w"> </span><span class="nx">d</span><span class="p">,</span> <span class="w"> </span><span class="nx">table</span><span class="p">:</span><span class="w"> </span><span class="nx">table</span><span class="p">,</span> <span class="w"> </span><span class="nx">dataobjects</span><span class="p">:</span><span class="w"> </span><span class="nx">dataobjects</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <p>The <code>scanIterator</code> needs to track where we are in in-memory rows, in data objects, and within a particular data object.</p> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">scanIterator</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">d</span><span class="w"> </span><span class="o">*</span><span class="nx">client</span> <span class="w"> </span><span class="nx">table</span><span class="w"> </span><span class="kt">string</span> <span class="w"> </span><span class="c1">// First we iterate through unflushed rows.</span> <span class="w"> </span><span class="nx">unflushedRows</span><span class="w"> </span><span class="p">[</span><span class="nx">DATAOBJECT_SIZE</span><span class="p">][]</span><span class="kt">any</span> <span class="w"> </span><span class="nx">unflushedRowsLen</span><span class="w"> </span><span class="kt">int</span> <span class="w"> </span><span class="nx">unflushedRowPointer</span><span class="w"> </span><span class="kt">int</span> <span class="w"> </span><span class="c1">// Then we move through each dataobject.</span> <span class="w"> </span><span class="nx">dataobjects</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span> <span class="w"> </span><span class="nx">dataobjectsPointer</span><span class="w"> </span><span class="kt">int</span> <span class="w"> </span><span class="c1">// And within each dataobject we iterate through rows.</span> <span class="w"> </span><span class="nx">dataobject</span><span class="w"> </span><span class="o">*</span><span class="nx">dataobject</span> <span class="w"> </span><span class="nx">dataobjectRowPointer</span><span class="w"> </span><span class="kt">int</span> <span class="p">}</span> </pre></div> <p>And the <code>scanIterator</code> will be driven by a <code>next()</code> method that goes through in-memory data first and then through what's on disk.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">d</span><span class="w"> </span><span class="o">*</span><span class="nx">client</span><span class="p">)</span><span class="w"> </span><span class="nx">readDataobject</span><span class="p">(</span><span class="nx">table</span><span class="p">,</span><span class="w"> </span><span class="nx">name</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">dataobject</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">bytes</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">os</span><span class="p">.</span><span class="nx">read</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">&quot;_table_%s_%s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">table</span><span class="p">,</span><span class="w"> </span><span class="nx">name</span><span class="p">))</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">do</span><span class="w"> </span><span class="nx">dataobject</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">Unmarshal</span><span class="p">(</span><span class="nx">bytes</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">do</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">do</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="p">}</span> <span class="c1">// returns (nil, nil) when done</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">si</span><span class="w"> </span><span class="o">*</span><span class="nx">scanIterator</span><span class="p">)</span><span class="w"> </span><span class="nx">next</span><span class="p">()</span><span class="w"> </span><span class="p">([]</span><span class="kt">any</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Iterate through in-memory rows first.</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">si</span><span class="p">.</span><span class="nx">unflushedRowPointer</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nx">si</span><span class="p">.</span><span class="nx">unflushedRowsLen</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">row</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">si</span><span class="p">.</span><span class="nx">unflushedRows</span><span class="p">[</span><span class="nx">si</span><span class="p">.</span><span class="nx">unflushedRowPointer</span><span class="p">]</span> <span class="w"> </span><span class="nx">si</span><span class="p">.</span><span class="nx">unflushedRowPointer</span><span class="o">++</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">row</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// If we&#39;ve gotten through all dataobjects on disk we&#39;re done.</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">si</span><span class="p">.</span><span class="nx">dataobjectsPointer</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">si</span><span class="p">.</span><span class="nx">dataobjects</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">si</span><span class="p">.</span><span class="nx">dataobject</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">name</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">si</span><span class="p">.</span><span class="nx">dataobjects</span><span class="p">[</span><span class="nx">si</span><span class="p">.</span><span class="nx">dataobjectsPointer</span><span class="p">]</span> <span class="w"> </span><span class="nx">o</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">si</span><span class="p">.</span><span class="nx">d</span><span class="p">.</span><span class="nx">readDataobject</span><span class="p">(</span><span class="nx">si</span><span class="p">.</span><span class="nx">table</span><span class="p">,</span><span class="w"> </span><span class="nx">name</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">si</span><span class="p">.</span><span class="nx">dataobject</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">o</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">si</span><span class="p">.</span><span class="nx">dataobjectRowPointer</span><span class="w"> </span><span class="p">&gt;</span><span class="w"> </span><span class="nx">si</span><span class="p">.</span><span class="nx">dataobject</span><span class="p">.</span><span class="nx">Len</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">si</span><span class="p">.</span><span class="nx">dataobjectsPointer</span><span class="o">++</span> <span class="w"> </span><span class="nx">si</span><span class="p">.</span><span class="nx">dataobject</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="nx">si</span><span class="p">.</span><span class="nx">dataobjectRowPointer</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">0</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">si</span><span class="p">.</span><span class="nx">next</span><span class="p">()</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">row</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">si</span><span class="p">.</span><span class="nx">dataobject</span><span class="p">.</span><span class="nx">Data</span><span class="p">[</span><span class="nx">si</span><span class="p">.</span><span class="nx">dataobjectRowPointer</span><span class="p">]</span> <span class="w"> </span><span class="nx">si</span><span class="p">.</span><span class="nx">dataobjectRowPointer</span><span class="o">++</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">row</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <p>That's it for scanning a table! The final piece of the puzzle is committing a transaction.</p> <h3 id="committing-a-transaction">Committing a transaction</h3><p>When we commit a transaction we must flush any remaining data. A read-only transaction (one which has no <code>Actions</code>) is immediately done. There is no concurrency check.</p> <p>Otherwise we will serialize transaction state and attempt to atomically <code>putIfAbsent</code>.</p> <p>The only way this will fail is if there is another concurrent writer.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">d</span><span class="w"> </span><span class="o">*</span><span class="nx">client</span><span class="p">)</span><span class="w"> </span><span class="nx">commitTx</span><span class="p">()</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">errNoTx</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Flush any outstanding data</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">table</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">.</span><span class="nx">tables</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">flushRows</span><span class="p">(</span><span class="nx">table</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">wrote</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">actions</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">.</span><span class="nx">Actions</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">actions</span><span class="p">)</span><span class="w"> </span><span class="p">&gt;</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">wrote</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">true</span> <span class="w"> </span><span class="k">break</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Read-only transaction, no need to do a concurrency check.</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">wrote</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">filename</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">&quot;_log_%020d&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">.</span><span class="nx">Id</span><span class="p">)</span> <span class="w"> </span><span class="c1">// We won&#39;t store previous actions, they will be recovered on</span> <span class="w"> </span><span class="c1">// new transactions. So unset them. Honestly not totally</span> <span class="w"> </span><span class="c1">// clear why.</span> <span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">.</span><span class="nx">previousActions</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="nx">bytes</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">Marshal</span><span class="p">(</span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">os</span><span class="p">.</span><span class="nx">putIfAbsent</span><span class="p">(</span><span class="nx">filename</span><span class="p">,</span><span class="w"> </span><span class="nx">bytes</span><span class="p">)</span> <span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">tx</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="s">&quot;unimplemented&quot;</span><span class="p">)</span> <span class="p">}</span> </pre></div> <p>This is the crux of Delta Lake. It's simple. And honestly it's a bit shocking. Real Delta Lake does support automatic retries in some cases. But primarily you are limited to a single writer per table, even if the writers are writing non-conflicting rows. Iceberg is basically the same here, it's just how metadata is tracked that differs.</p> <p class="note"> As mentioned in another note above, our implementation is actually stricter than Delta Lake since it manages all table transaction logs together. This means you can get snapshot isolation across all tables (which Delta Lake doesn't support) but it will mean significantly more contention and failed write transactions. </p><p>The Delta Lake and Iceberg folks apparently wanted to avoid FoundationDB (i.e. the Snowflake architecture, which is mentioned in the Delta Lake paper) so much that they'd give up row-level concurrency to be mostly serverless.</p> <p>Is it worth it? Dunno. Delta Lake and Iceberg are getting massive adoption. Many very smart people have worked, and continue to work, on both. Moreover it is apparently what the market wants. Every database-like product is implementing, or is planning to implement, Delta Lake or Iceberg.</p> <h3 id="trying-it-out">Trying it out</h3><p>Let's add a test in <code>main_test.go</code> to see what happens with concurrent writers. Follow the comments and debug logs for details:</p> <div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span> <span class="kn">import</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="s">&quot;os&quot;</span> <span class="w"> </span><span class="s">&quot;testing&quot;</span> <span class="p">)</span> <span class="kd">func</span><span class="w"> </span><span class="nx">TestConcurrentTableWriters</span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="o">*</span><span class="nx">testing</span><span class="p">.</span><span class="nx">T</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">dir</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">MkdirTemp</span><span class="p">(</span><span class="s">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;test-database&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Remove</span><span class="p">(</span><span class="nx">dir</span><span class="p">)</span> <span class="w"> </span><span class="nx">fos</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">newFileObjectStorage</span><span class="p">(</span><span class="nx">dir</span><span class="p">)</span> <span class="w"> </span><span class="nx">c1Writer</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">newClient</span><span class="p">(</span><span class="nx">fos</span><span class="p">)</span> <span class="w"> </span><span class="nx">c2Writer</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">newClient</span><span class="p">(</span><span class="nx">fos</span><span class="p">)</span> <span class="w"> </span><span class="c1">// Have c2Writer start up a transaction.</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c2Writer</span><span class="p">.</span><span class="nx">newTx</span><span class="p">()</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;could not start first c2 tx&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">&quot;[c2] new tx&quot;</span><span class="p">)</span> <span class="w"> </span><span class="c1">// But then have c1Writer start a transaction and commit it first.</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c1Writer</span><span class="p">.</span><span class="nx">newTx</span><span class="p">()</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;could not start first c1 tx&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">&quot;[c1] new tx&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c1Writer</span><span class="p">.</span><span class="nx">createTable</span><span class="p">(</span><span class="s">&quot;x&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">&quot;a&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;b&quot;</span><span class="p">})</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;could not create x&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">&quot;[c1] Created table&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c1Writer</span><span class="p">.</span><span class="nx">writeRow</span><span class="p">(</span><span class="s">&quot;x&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">any</span><span class="p">{</span><span class="s">&quot;Joey&quot;</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">})</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;could not write first row&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">&quot;[c1] Wrote row&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c1Writer</span><span class="p">.</span><span class="nx">writeRow</span><span class="p">(</span><span class="s">&quot;x&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">any</span><span class="p">{</span><span class="s">&quot;Yue&quot;</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span><span class="p">})</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;could not write second row&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">&quot;[c1] Wrote row&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c1Writer</span><span class="p">.</span><span class="nx">commitTx</span><span class="p">()</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;could not commit tx&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">&quot;[c1] Committed tx&quot;</span><span class="p">)</span> <span class="w"> </span><span class="c1">// Now go back to c2 and write data.</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c2Writer</span><span class="p">.</span><span class="nx">createTable</span><span class="p">(</span><span class="s">&quot;x&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">&quot;a&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;b&quot;</span><span class="p">})</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;could not create x&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">&quot;[c2] Created table&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c2Writer</span><span class="p">.</span><span class="nx">writeRow</span><span class="p">(</span><span class="s">&quot;x&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">any</span><span class="p">{</span><span class="s">&quot;Holly&quot;</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">})</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;could not write first row&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">&quot;[c2] Wrote row&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c2Writer</span><span class="p">.</span><span class="nx">commitTx</span><span class="p">()</span> <span class="w"> </span><span class="nx">assert</span><span class="p">(</span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;concurrent commit must fail&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">&quot;[c2] tx not committed&quot;</span><span class="p">)</span> <span class="p">}</span> </pre></div> <p>Try it out:</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>go<span class="w"> </span>mod<span class="w"> </span>init<span class="w"> </span>otf <span class="gp">$ </span>go<span class="w"> </span>mod<span class="w"> </span>tidy <span class="gp">$ </span>go<span class="w"> </span><span class="nb">test</span><span class="w"> </span>-run<span class="w"> </span>TestConcurrentTableWriters<span class="w"> </span>--<span class="w"> </span>--debug <span class="go">[DEBUG] [c2] new tx</span> <span class="go">[DEBUG] [c1] new tx</span> <span class="go">[DEBUG] [c1] Created table</span> <span class="go">[DEBUG] [c1] Wrote row</span> <span class="go">[DEBUG] [c1] Wrote row</span> <span class="go">[DEBUG] [c1] Committed tx</span> <span class="go">[DEBUG] [c2] Created table</span> <span class="go">[DEBUG] [c2] Wrote row</span> <span class="go">[DEBUG] [c2] tx not committed</span> <span class="go">PASS</span> <span class="go">ok otf 0.311s</span> </pre></div> <p>That's pretty cool.</p> <p>And what about a reader and concurrent writer? Observe that the reader always reads a snapshot. Follow the comments again for detail:</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">TestConcurrentReaderWithWriterReadsSnapshot</span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="o">*</span><span class="nx">testing</span><span class="p">.</span><span class="nx">T</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">dir</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">MkdirTemp</span><span class="p">(</span><span class="s">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;test-database&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Remove</span><span class="p">(</span><span class="nx">dir</span><span class="p">)</span> <span class="w"> </span><span class="nx">fos</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">newFileObjectStorage</span><span class="p">(</span><span class="nx">dir</span><span class="p">)</span> <span class="w"> </span><span class="nx">c1Writer</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">newClient</span><span class="p">(</span><span class="nx">fos</span><span class="p">)</span> <span class="w"> </span><span class="nx">c2Reader</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">newClient</span><span class="p">(</span><span class="nx">fos</span><span class="p">)</span> <span class="w"> </span><span class="c1">// First create some data and commit the transaction.</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c1Writer</span><span class="p">.</span><span class="nx">newTx</span><span class="p">()</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;could not start first c1 tx&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">&quot;[c1Writer] Started tx&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c1Writer</span><span class="p">.</span><span class="nx">createTable</span><span class="p">(</span><span class="s">&quot;x&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">&quot;a&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;b&quot;</span><span class="p">})</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;could not create x&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">&quot;[c1Writer] Created table&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c1Writer</span><span class="p">.</span><span class="nx">writeRow</span><span class="p">(</span><span class="s">&quot;x&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">any</span><span class="p">{</span><span class="s">&quot;Joey&quot;</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">})</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;could not write first row&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">&quot;[c1Writer] Wrote row&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c1Writer</span><span class="p">.</span><span class="nx">writeRow</span><span class="p">(</span><span class="s">&quot;x&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">any</span><span class="p">{</span><span class="s">&quot;Yue&quot;</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span><span class="p">})</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;could not write second row&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">&quot;[c1Writer] Wrote row&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c1Writer</span><span class="p">.</span><span class="nx">commitTx</span><span class="p">()</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;could not commit tx&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">&quot;[c1Writer] Committed tx&quot;</span><span class="p">)</span> <span class="w"> </span><span class="c1">// Now start a new transaction for more edits.</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c1Writer</span><span class="p">.</span><span class="nx">newTx</span><span class="p">()</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;could not start second c1 tx&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">&quot;[c1Writer] Starting new write tx&quot;</span><span class="p">)</span> <span class="w"> </span><span class="c1">// Before we commit this second write-transaction, start a</span> <span class="w"> </span><span class="c1">// read transaction.</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c2Reader</span><span class="p">.</span><span class="nx">newTx</span><span class="p">()</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;could not start c2 tx&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">&quot;[c2Reader] Started tx&quot;</span><span class="p">)</span> <span class="w"> </span><span class="c1">// Write and commit rows in c1.</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c1Writer</span><span class="p">.</span><span class="nx">writeRow</span><span class="p">(</span><span class="s">&quot;x&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">any</span><span class="p">{</span><span class="s">&quot;Ada&quot;</span><span class="p">,</span><span class="w"> </span><span class="mi">3</span><span class="p">})</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;could not write third row&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">&quot;[c1Writer] Wrote third row&quot;</span><span class="p">)</span> <span class="w"> </span><span class="c1">// Scan x in read-only transaction</span> <span class="w"> </span><span class="nx">it</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c2Reader</span><span class="p">.</span><span class="nx">scan</span><span class="p">(</span><span class="s">&quot;x&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;could not scan x&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">&quot;[c2Reader] Started scanning&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">seen</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">row</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">it</span><span class="p">.</span><span class="nx">next</span><span class="p">()</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;could not iterate x scan&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">row</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">&quot;[c2Reader] Done scanning&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">break</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">&quot;[c2Reader] Got row in reader tx&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">row</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">seen</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">row</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span><span class="w"> </span><span class="s">&quot;Joey&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;row mismatch in c1&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">row</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="w"> </span><span class="mf">1.0</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;row mismatch in c1&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">row</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span><span class="w"> </span><span class="s">&quot;Yue&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;row mismatch in c1&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">row</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="w"> </span><span class="mf">2.0</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;row mismatch in c1&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">seen</span><span class="o">++</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">seen</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;expected two rows&quot;</span><span class="p">)</span> <span class="w"> </span><span class="c1">// Scan x in c1 write transaction</span> <span class="w"> </span><span class="nx">it</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c1Writer</span><span class="p">.</span><span class="nx">scan</span><span class="p">(</span><span class="s">&quot;x&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;could not scan x in c1&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">&quot;[c1Writer] Started scanning&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">seen</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">0</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">row</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">it</span><span class="p">.</span><span class="nx">next</span><span class="p">()</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;could not iterate x scan in c1&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">row</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">&quot;[c1Writer] Done scanning&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">break</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">&quot;[c1Writer] Got row in tx&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">row</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">seen</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">row</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span><span class="w"> </span><span class="s">&quot;Ada&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;row mismatch in c1&quot;</span><span class="p">)</span> <span class="w"> </span><span class="c1">// Since this hasn&#39;t been serialized to JSON, it&#39;s still an int not a float.</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">row</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="w"> </span><span class="mi">3</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;row mismatch in c1&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">seen</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">row</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span><span class="w"> </span><span class="s">&quot;Joey&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;row mismatch in c1&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">row</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="w"> </span><span class="mf">1.0</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;row mismatch in c1&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">row</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span><span class="w"> </span><span class="s">&quot;Yue&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;row mismatch in c1&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">row</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="w"> </span><span class="mf">2.0</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;row mismatch in c1&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">seen</span><span class="o">++</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">seen</span><span class="p">,</span><span class="w"> </span><span class="mi">3</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;expected three rows&quot;</span><span class="p">)</span> <span class="w"> </span><span class="c1">// Writer committing should succeed.</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c1Writer</span><span class="p">.</span><span class="nx">commitTx</span><span class="p">()</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;could not commit second tx&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">&quot;[c1Writer] Committed tx&quot;</span><span class="p">)</span> <span class="w"> </span><span class="c1">// Reader committing should succeed.</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c2Reader</span><span class="p">.</span><span class="nx">commitTx</span><span class="p">()</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;could not commit read-only tx&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">&quot;[c2Reader] Committed tx&quot;</span><span class="p">)</span> <span class="p">}</span> </pre></div> <p>Run it:</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>go<span class="w"> </span><span class="nb">test</span><span class="w"> </span>-run<span class="w"> </span>TestConcurrentReaderWithWriterReadsSnapshot<span class="w"> </span>--<span class="w"> </span>--debug <span class="go">[DEBUG] [c1Writer] Started tx</span> <span class="go">[DEBUG] [c1Writer] Created table</span> <span class="go">[DEBUG] [c1Writer] Wrote row</span> <span class="go">[DEBUG] [c1Writer] Wrote row</span> <span class="go">[DEBUG] [c1Writer] Committed tx</span> <span class="go">[DEBUG] [c1Writer] Starting new write tx</span> <span class="go">[DEBUG] [c2Reader] Started tx</span> <span class="go">[DEBUG] [c1Writer] Wrote third row</span> <span class="go">[DEBUG] [c2Reader] Started scanning</span> <span class="go">[DEBUG] [c2Reader] Got row in reader tx [Joey 1]</span> <span class="go">[DEBUG] [c2Reader] Got row in reader tx [Yue 2]</span> <span class="go">[DEBUG] [c2Reader] Done scanning</span> <span class="go">[DEBUG] [c1Writer] Started scanning</span> <span class="go">[DEBUG] [c1Writer] Got row in tx [Ada 3]</span> <span class="go">[DEBUG] [c1Writer] Got row in tx [Joey 1]</span> <span class="go">[DEBUG] [c1Writer] Got row in tx [Yue 2]</span> <span class="go">[DEBUG] [c1Writer] Done scanning</span> <span class="go">[DEBUG] [c1Writer] Committed tx</span> <span class="go">[DEBUG] [c2Reader] Committed tx</span> <span class="go">PASS</span> <span class="go">ok otf 0.252s</span> </pre></div> <p>Sweet.</p> <h3 id="what's-next?">What's next?</h3><p>As mentioned, we didn't touch a lot of things. Handling updates and deletes, transaction log checkpoints, data object compaction, etc.</p> <p>Take a close look at the <a href="https://www.vldb.org/pvldb/vol13/p3411-armbrust.pdf">Delta Lake paper</a> and the <a href="https://github.com/delta-io/delta/blob/master/PROTOCOL.md">Delta Lake Spec</a> and see what you can do!</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Build a serverless ACID database with this one neat trick.<br><br>(New blog post)<a href="https://t.co/rHgfKSPY6q">https://t.co/rHgfKSPY6q</a> <a href="https://t.co/1hmjsxIk6w">pic.twitter.com/1hmjsxIk6w</a></p>&mdash; Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1840474893491560777?ref_src=twsrc%5Etfw">September 29, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/2024-09-29-build-a-serverless-acid-database-with-this-one-neat-trick.htmlSun, 29 Sep 2024 00:00:00 +0000Be someone who does thingshttp://notes.eatonphil.com/2024-09-23-be-someone-who-does-things.html<p>I <a href="https://notes.eatonphil.com/2024-08-24-obsession.html">wrote last month</a> that <em>what you want to do</em> is one of the most useful motivations in life. I want to follow that up by saying that the only thing more important than wanting to do something is to <em>actually</em> do something.</p> <p>The most valuable trait you can develop for yourself is to be consistent. It is absolutely something you can develop. And moreover it's kind of hard to believe that for anyone it is innate.</p> <p>I meet so many people who say they want to do things. And I ask them what they're doing to get there and they get flustered. This is completely understandable.</p> <p>I meet so many students who feel overwhelmed by what everyone else is doing. This is also understandable.</p> <p>But it doesn't matter what anyone else is doing. It doesn't matter where anyone else is at. It matters where you are at. Compete with yourself before you compete with anyone else. What matters is that you get into a habit of consistently working on little goals.</p> <p>If you pick something that is too complex, break it down. Keep on breaking problems or ideas down until you find a problem or idea you can solve.</p> <p>Then keep on finding new problems to solve. Move on in complexity over time as you can and want to.</p> <p>Don't worry about getting things perfect. Who can discredit you for doing your best? What shame is there when you're being earnest? The only thing that makes sense to feel bad about is not <em>trying to do</em> what you <em>genuinely wanted to do</em>.</p> <p>And this doesn't have to be about projects or ideas outside of work. There may be things you want to do at work like improving documentation or writing better tests or adding new checks to code or blogging or interviewing customers or working with another team.</p> <p>Like I said in <a href="https://notes.eatonphil.com/2024-08-24-obsession.html">Obsession</a>, don't worry about what you do daily. That is too frequent to think about. Instead think about what you're doing once a month.</p> <p>Make time once a month to publish a post or complete a small project. Whatever you want to do, I am confident you can find some small version of it that you could commit to doing once a month. Be consistent!</p> <p>If a month is too often, pick a longer freqency. Find whatever cadence and whatever size of project that allows you to be consistent.</p> <p>When you're consistent over the course of months I think you'll be astounded at what you accomplish in a year.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Shorter post tonight, may add to this later on.<br><br>Be someone who does things. And do these (little) things consistently.<a href="https://t.co/oVb6Sz8eEK">https://t.co/oVb6Sz8eEK</a> <a href="https://t.co/kNrZQ4pvTN">pic.twitter.com/kNrZQ4pvTN</a></p>&mdash; Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1838378171005128910?ref_src=twsrc%5Etfw">September 24, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/2024-09-23-be-someone-who-does-things.htmlMon, 23 Sep 2024 00:00:00 +0000Obsessionhttp://notes.eatonphil.com/2024-08-24-obsession.html<p>In your professional and personal life, I don't believe there is a stronger motivation than having something in mind and the desire to do it. Yet the natural way to deal with a desire to do something is to justify why it's not possible.</p> <p>"I want to read more books but nobody reads books these days so how could I."</p> <p>"I want to write for a magazine but I have no experience writing professionally."</p> <p>"I want to build a company someday but how could someone of my background."</p> <p>Our official mentors, our managers, through a combination of well-intentioned defeatism and well-intentioned lack of accomplishment themselves, among other things, are often unable to process big goals or guide you toward them.</p> <p>I've been one of these managers myself. In fact I have, to my immense regret, tried too often to convince people to do what is practical rather than what they want to do. Or to do what I judged they were capable of doing rather than what they wanted to do.</p> <p>In the best cases, my listener had the self-confidence to ignore me. They did what they wanted to do anyway. In the worst case, again to my deep regret, I've been a well-intentioned part of derailing someone's career for years.</p> <p>So I don't want to convince anyone of anything anymore. If I start trying to convince someone by accident, I try to catch myself. I try to avoid sentences like "I think you should …". Instead "Here is something that's worked for me: …" or "Here is what I've heard works well for other people: …".</p> <p>Nobody wants to be convinced. But intelligent people will change their mind when exposed to new facts or different ideas. Being convinced is a battle of will. Changing one's mind is a purely personal decision.</p> <p>There are certainly people with discipline who can grind on things they hate doing and eventually become experts at it. But more often I see people grind on things they hate only to become depressed and give up.</p> <p>For most of us, our best hope is (healthy) obsession. And obsession, in the sense I'm talking about, does not come from something you are ambivalent about or hate. Obsession can only come when you're doing something you actually want to do.</p> <p>For big goals or big changes, you need regular commitment weekly, monthly, yearly. Over the course of years. And only obsession makes that work not actually feel like work. Obsession is the only thing that makes discipline not feel like discipline.</p> <p>That big goals take years to accomplish need not be scary. Obsession doesn't mean you can't pivot. There is quite a lot to gain by committing to something regularly over the course of years even if you decide to stop and commit from then on to something else. You will learn a good deal.</p> <p>And healthy obsession to me is more specifically measurable on the order of weeks, not hours or days. Healthy obsession means you're still building healthy personal and professional relationships. You're still taking care of yourself, emotionally and physically.</p> <p>I do not have high expectations for people in general. This seems healthy and reasonable. But as I meet more people and observe them over the years, I am only more convinced of the vast potential of individuals. Individuals are almost universally underestimated.</p> <p>I think you can do almost anything you want to do. If you commit to do doing it.</p> <p>I'll end this with a personal story.</p> <p>Until 11th grade, I hated school. I hated the rigidity. Being forced to be somewhere for hours and to follow so many rules. I skipped so many days of school I'm embarrassed by it. I'd never do homework at home. I never studied for tests. I got Bs and Cs in the second-tier classes. I was in the orchestra for 6 years and never practiced at home. I was not cool enough to be a "bad kid" but I did not understand the system and had no discipline whatsoever.</p> <p>I found out at the end of 10th grade that I could actually afford college if I got into a good enough school that paid full needs-based tuition. It sounded significantly better than the only other option that seemed obvious, joining the military as a recruit. I realized and decided that if I wanted to get into a good school I needed to not half-ass things.</p> <p>Somehow, I decided to only do things I could become obsessed with. And I decided to be obsessed in the way that I wanted, not to do what everyone else did (which I basically could not do since I had no discipline). If we covered a topic in class, I'd read news about it or watch movies about it. I'd get myself excited about the topic in every way I could.</p> <p>It basically worked out. I ended high school in the top 10% of the class (up from top 40% or something). I got into a good liberal arts college that paid the entirety of my tuition. But I remained a basically lazy and undisciplined person. I never stayed up late studying for a test. I dropped out after a year and a half for family reasons.</p> <p>But I've now spent the last 10 years in my spare time working on compiler projects, interpreter projects, parser projects, database projects, distributed systems projects. I've spent the last 6 years consistently publishing at least one blog post per month.</p> <p>I didn't want to work the way everyone else worked. I wanted to be obsessed about what I worked on.</p> <p>Obsession has made all of this into something I now barely register as doing. It's allowed me to continue adding activities like organizing book clubs and meetups to the list of things I'm up to. Up until basically this year I could have in good faith said I am a very lazy and undisciplined person. But obsession turned me into someone with discipline.</p> <p>Obsession became about more than just the tech. It meant trying to fully understand the product, the users, the market. It meant thinking more carefully about product documentation, user interfaces, company messaging. Obsession meant reflecting on how I treat my coworkers, and how my coworkers feel treated by others in general. Obsession meant wanting an equitable and encouraging work environment for everyone.</p> <p>And, as I said, it's about healthy obsession. I didn't really understand the "healthy" part until a few years ago. But I'm now convinced that the "healthy" part is as important as the "obsession" part. To go to the gym regularly. To play pickup volleyball. To cook excellent food. To read fiction and poetry and play music. To serve the community. To be friendly and encouraging to all people. To meet new people and build better genuine friendships.</p> <p>And in the context of work, "healthy obsession" means understanding you can't do everything, even while you care about everything. It means accepting that you make mistakes and that you do your best; that you try to do better and learn from mistakes the next time.</p> <p>It's got to be sustainable. And we can develop a healthy obsession while we have quite a bit of fun too. :)</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I wrote an essay on my mistakes trying to convince people to do something, on doing what you want to do, and on obsession.<br><br>Ended with a personal note on developing healthy discipline, and having fun. :)<a href="https://t.co/4WWdtU6AhL">https://t.co/4WWdtU6AhL</a> <a href="https://t.co/lBw7zlqWeq">pic.twitter.com/lBw7zlqWeq</a></p>&mdash; Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1827373730781147241?ref_src=twsrc%5Etfw">August 24, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/2024-08-24-obsession.htmlSat, 24 Aug 2024 00:00:00 +0000What's the big deal about Deterministic Simulation Testing?http://notes.eatonphil.com/2024-08-20-deterministic-simulation-testing.html<p>Bugs in distributed systems are hard to find, largely because systems interact in chaotic ways. And even once you've found a bug, it can be anywhere from simple to impossible to reproduce it. It's about as far away as you can get from the ideal test environment: property testing a pure function.</p> <p>But what if we could write our code in a way that we can isolate the chaotic aspects of our distributed system during <i>testing</i>: run multiple systems communicating with each other on a <i>single thread</i> and control all randomness in each system? And property test this single-threaded version of the distributed system with controlled randomness, all the while injecting faults (fancy term for unhappy path behavior like errors and latency) we might see in the real-world?</p> <p>Crazy as it sounds, people actually do this. It's called Deterministic Simulation Testing (DST). And it's become more and more popular with startups like FoundationDB, Antithesis, TigerBeetle, Polar Signals, and WarpStream; as well as folks like Tyler Neely and Pekka Enberg, talking about and making use of this technique.</p> <p>It has become so popular to talk about DST in my corner of the world that I worry it risks coming off sounding too magical and maybe a little hyped. It's worth getting a better understanding of both the benefits and the limitations.</p> <p>Thank you to <a href="https://www.linkedin.com/in/alexmillerdb/">Alex Miller</a> and <a href="https://www.linkedin.com/in/will-wilson-330276112/">Will Wilson</a> for reviewing a version of this post.</p> <h3 id="randomness-and-time">Randomness and time</h3><p>A big source of non-determinism in business logic is the use of random numbers—in your code or your transitive dependencies or your language runtime or your operating system.</p> <p>Crucially, DST does not imply you can't have randomness! DST merely assumes that you have a global seed for all randomness in your program and that the simulator controls the seed. The seed may change across runs of the simulator.</p> <p>Once you observe a bad state as a result of running the simulation on a random seed, you allow the user to enter the same seed again. This allows the user to recreate the entire program run that led to that observed bad state. Allows the user to debug the program trivially.</p> <p>Another big source of non-determinism is being dependent on time. As with randomness, DST does not mean you can't depend on time. DST means you must be able to control the clock during the simulation.</p> <p>To "control" randomness or time basically means you support dependency injection, or the old-school alternative to dependency injection called <i>passing the dependency as an explicit parameter</i>. Rather than referring to a global clock or a global seed, you need to be able to receive a clock or a seed from someone.</p> <p>For example we might separate the operation of an application into the language's <code>main()</code> entrypoint and an actual application <code>start()</code> entrypoint.</p> <div class="highlight"><pre><span></span><span class="c1"># app.pseudocode</span> <span class="k">def</span> <span class="nf">start</span><span class="p">(</span><span class="n">clock</span><span class="p">,</span> <span class="n">seed</span><span class="p">):</span> <span class="c1"># lots of business logic that might depend on time or do random things</span> <span class="k">def</span> <span class="nf">main</span><span class="p">:</span> <span class="n">clock</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">clock</span><span class="p">()</span> <span class="n">seed</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">now</span><span class="p">()</span> <span class="n">app</span><span class="o">.</span><span class="n">start</span><span class="p">(</span><span class="n">clock</span><span class="p">,</span> <span class="n">seed</span><span class="p">)</span> </pre></div> <p>The application entrypoint is where we must be able to swap out a real clock or real random seed for one controlled by our simulator:</p> <div class="highlight"><pre><span></span><span class="c1"># sim.pseudocode</span> <span class="kn">import</span> <span class="s2">&quot;app.pseudocode&quot;</span> <span class="k">def</span> <span class="nf">main</span><span class="p">:</span> <span class="n">sim_clock</span> <span class="o">=</span> <span class="n">make_sim_clock</span><span class="p">()</span> <span class="n">sim_seed</span> <span class="o">=</span> <span class="n">os</span><span class="o">.</span><span class="n">env</span><span class="o">.</span><span class="n">DST_SEED</span> <span class="ow">or</span> <span class="n">time</span><span class="o">.</span><span class="n">now</span><span class="p">()</span> <span class="k">try</span><span class="p">:</span> <span class="n">app</span><span class="o">.</span><span class="n">start</span><span class="p">(</span><span class="n">sim_clock</span><span class="p">,</span> <span class="n">sim_seed</span><span class="p">)</span> <span class="n">catch</span><span class="p">(</span><span class="n">e</span><span class="p">):</span> <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;Bad execution at seed: </span><span class="si">%s</span><span class="s2">&quot;</span><span class="p">,</span> <span class="n">sim_seed</span><span class="p">)</span> <span class="n">throw</span> <span class="n">e</span> </pre></div> <p>Let's look at another example.</p> <h3 id="converting-an-existing-function">Converting an existing function</h3><p>Let's say that we had a helper method that kept calling a function until it succeeded, with backoff.</p> <div class="highlight"><pre><span></span><span class="c1"># retry.pseudocode</span> <span class="k">class</span> <span class="nc">Backoff</span><span class="p">:</span> <span class="k">def</span> <span class="nf">init</span><span class="p">:</span> <span class="n">this</span><span class="o">.</span><span class="n">rnd</span> <span class="o">=</span> <span class="n">rnd</span><span class="o">.</span><span class="n">new</span><span class="p">(</span><span class="n">seed</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">now</span><span class="p">())</span> <span class="n">this</span><span class="o">.</span><span class="n">tries</span> <span class="o">=</span> <span class="mi">0</span> <span class="k">async</span> <span class="k">def</span> <span class="nf">retry_backoff</span><span class="p">(</span><span class="n">f</span><span class="p">):</span> <span class="k">while</span> <span class="n">this</span><span class="o">.</span><span class="n">tries</span> <span class="o">&lt;</span> <span class="mi">3</span><span class="p">:</span> <span class="k">if</span> <span class="n">f</span><span class="p">():</span> <span class="k">return</span> <span class="k">await</span> <span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="n">this</span><span class="o">.</span><span class="n">rnd</span><span class="o">.</span><span class="n">gen</span><span class="p">())</span> <span class="n">this</span><span class="o">.</span><span class="n">tries</span><span class="o">++</span> </pre></div> <p>There is a single source of nondeterminism here and it's where we generate a seed. We could parameterize the seed, but since we want to call <code>time.sleep()</code> and since in DST we control the time, we can just parameterize <code>time</code>.</p> <div class="highlight"><pre><span></span><span class="c1"># retry.psuedocode</span> <span class="k">class</span> <span class="nc">Backoff</span><span class="p">:</span> <span class="k">def</span> <span class="nf">init</span><span class="p">(</span><span class="n">this</span><span class="p">,</span> <span class="n">time</span><span class="p">):</span> <span class="n">this</span><span class="o">.</span><span class="n">time</span> <span class="o">=</span> <span class="n">time</span> <span class="n">this</span><span class="o">.</span><span class="n">rnd</span> <span class="o">=</span> <span class="n">rnd</span><span class="o">.</span><span class="n">new</span><span class="p">(</span><span class="n">seed</span> <span class="o">=</span> <span class="n">this</span><span class="o">.</span><span class="n">time</span><span class="o">.</span><span class="n">now</span><span class="p">())</span> <span class="n">this</span><span class="o">.</span><span class="n">tries</span> <span class="o">=</span> <span class="mi">0</span> <span class="k">async</span> <span class="k">def</span> <span class="nf">retry_backoff</span><span class="p">(</span><span class="n">this</span><span class="p">,</span> <span class="n">f</span><span class="p">):</span> <span class="k">while</span> <span class="n">this</span><span class="o">.</span><span class="n">tries</span> <span class="o">&lt;</span> <span class="mi">3</span><span class="p">:</span> <span class="k">if</span> <span class="n">f</span><span class="p">():</span> <span class="k">return</span> <span class="k">await</span> <span class="n">this</span><span class="o">.</span><span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="n">this</span><span class="o">.</span><span class="n">rnd</span><span class="o">.</span><span class="n">gen</span><span class="p">())</span> <span class="n">this</span><span class="o">.</span><span class="n">tries</span><span class="o">++</span> </pre></div> <p>Now we can write a little simulator to test this:</p> <div class="highlight"><pre><span></span><span class="c1"># sim.psuedocode</span> <span class="kn">import</span> <span class="s2">&quot;retry.pseudocode&quot;</span> <span class="n">sim_time</span> <span class="o">=</span> <span class="p">{</span> <span class="n">now</span><span class="p">:</span> <span class="mi">0</span> <span class="n">sleep</span><span class="p">:</span> <span class="p">(</span><span class="n">ms</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span> <span class="k">await</span> <span class="n">future</span><span class="o">.</span><span class="n">wait</span><span class="p">(</span><span class="n">ms</span><span class="p">)</span> <span class="p">}</span> <span class="n">tick</span><span class="p">:</span> <span class="p">(</span><span class="n">ms</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="n">now</span> <span class="o">+=</span> <span class="n">ms</span> <span class="p">}</span> <span class="n">backoff</span> <span class="o">=</span> <span class="n">Backoff</span><span class="p">(</span><span class="n">sim_time</span><span class="p">)</span> <span class="k">while</span> <span class="n">true</span><span class="p">:</span> <span class="n">failures</span> <span class="o">=</span> <span class="mi">0</span> <span class="n">f</span> <span class="o">=</span> <span class="p">()</span> <span class="o">=&gt;</span> <span class="p">{</span> <span class="k">if</span> <span class="n">rnd</span><span class="o">.</span><span class="n">rand</span><span class="p">()</span> <span class="o">&gt;</span> <span class="mf">0.5</span><span class="p">:</span> <span class="n">failures</span><span class="o">++</span> <span class="k">return</span> <span class="n">false</span> <span class="k">return</span> <span class="n">true</span> <span class="p">}</span> <span class="k">try</span><span class="p">:</span> <span class="k">while</span> <span class="n">sim_time</span><span class="o">.</span><span class="n">now</span> <span class="o">&lt;</span> <span class="mi">60</span><span class="nb">min</span><span class="p">:</span> <span class="n">promise</span> <span class="o">=</span> <span class="n">backoff</span><span class="o">.</span><span class="n">retry_backoff</span><span class="p">(</span><span class="n">f</span><span class="p">)</span> <span class="n">sim_time</span><span class="o">.</span><span class="n">tick</span><span class="p">(</span><span class="mi">1</span><span class="n">ms</span><span class="p">)</span> <span class="k">if</span> <span class="n">promise</span><span class="o">.</span><span class="n">read</span><span class="p">():</span> <span class="k">break</span> <span class="n">assert_expect_failure_and_expected_time_elapse</span><span class="p">(</span><span class="n">sim_time</span><span class="p">,</span> <span class="n">failures</span><span class="p">)</span> <span class="n">catch</span><span class="p">(</span><span class="n">e</span><span class="p">):</span> <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;Found logical error with seed: </span><span class="si">%d</span><span class="s2">&quot;</span><span class="p">,</span> <span class="n">seed</span><span class="p">)</span> <span class="n">throw</span> <span class="n">e</span> </pre></div> <p>This demonstrates a few critical aspects of DST. First, the simulator itself depends on randomness. But allows the user to provide a seed so they can replay a simulation that discovers a bug. The controlled randomness in the simulator is what lets us do property testing.</p> <p>Second, the simulation workload must be written by the user. Even when you've got a platform like Antithesis that gives you an environment for DST, it's up to you to exercise the application.</p> <p>Now let's get a little more complex.</p> <h3 id="a-single-thread-and-asynchronous-io">A single thread and asynchronous IO</h3><p>The determinism of multiple threads can only be controlled at the operating system or emulator or hypervisor layer. Realistically, that would require third-party systems like Antithesis or <a href="https://github.com/facebookexperimental/hermit">Hermit</a> (which, don't get excited, is not actively developed and hasn't worked on any interesting program of mine) or <a href="https://rr-project.org/">rr</a>.</p> <p>These systems transparently transform multi-threaded code into single threaded code. But also note that Hermit and rr have only limited ability to do fault injection which, in addition to deterministic execution, is a goal of ours. And you can't run them on a mac. And <a href="https://github.com/rr-debugger/rr/issues/1373">can't</a> <a href="https://github.com/facebookexperimental/hermit?tab=readme-ov-file#support">run</a> them on ARM.</p> <p>But we can, and would like, to write a simulator without writing a new operating system or emulator or hypervisor, and without a third-party system. So we must limit ourselves to writing code that can be collapsed into a single thread. Significantly, since using blocking IO would mean an entire class of concurrency bugs could not be discovered while running the simulator in a single thread, we must limit ourselves to asynchronous IO.</p> <p>Single threaded and asynchronous IO. These are already two big limitations.</p> <p>Some languages like Go are entirely built around transparent multi-threading and blocking IO. Polar Signals <a href="https://www.polarsignals.com/blog/posts/2024/05/28/mostly-dst-in-go">solved</a> this for DST by compiling their application to WASM where it would run on a single thread. But that wasn't enough. Even on a single thread, the Go runtime intentionally schedules goroutines randomly. So Polar Signals forked the Go runtime to control this randomness with an environment variable. That's kind of crazy. Resonate took <a href="https://github.com/resonatehq/resonate/blob/268c588e302f13187309e4b37636d19595d42fa1/internal/kernel/scheduler/coroutine.go">another approach</a> that also looks cumbersome. I'm not going to attempt to describe it. Go seems like a difficult choice of a language if you want to do DST.</p> <p>Like Go, Rust has no builtin async IO. The most mature async IO library is tokio. The tokio folks attempted to provide a tokio-compatible <a href="https://github.com/tokio-rs/simulator">simulator</a> implementation with all sources of nondeterminism removed. From what I can tell, they did not at any point fully <a href="https://github.com/tokio-rs/tokio/issues/1845">succeed</a>. That repo has now been replaced with a "this is very experimental" tokio-rs project called <a href="https://github.com/tokio-rs/turmoil">turmoil</a> that provides deterministic execution plus network fault injection. (But not disk fault injection. More on that later.) It isn't surprising that it is difficult to provide deterministic execution for an IO library that was not designed for it. tokio is a large project with many transitive dependencies. They must all be combed for non-determinism.</p> <p>On the other hand, Pekka has <a href="https://github.com/penberg/hiisi/blob/main/hiisi-server/src/io/generic.rs">already demonstrated</a> for us how we might build a simpler Rust async IO library that is designed to be simulation tested. He modeled this on the TigerBeetle design King and I <a href="https://tigerbeetle.com/blog/a-friendly-abstraction-over-iouring-and-kqueue">wrote</a> about two years ago.</p> <p>So let's sketch out a program that does buggy IO and let's look at how we can apply DST to it.</p> <div class="highlight"><pre><span></span><span class="c1"># readfile.pseudocode</span> <span class="k">def</span> <span class="nf">read_file</span><span class="p">(</span><span class="n">io</span><span class="p">,</span> <span class="n">name</span><span class="p">,</span> <span class="n">into_buffer</span><span class="p">):</span> <span class="n">f</span> <span class="o">=</span> <span class="k">await</span> <span class="n">io</span><span class="o">.</span><span class="n">open</span><span class="p">(</span><span class="n">name</span><span class="p">)</span> <span class="n">read_buffer</span> <span class="o">=</span> <span class="p">[</span><span class="mi">4096</span><span class="p">]</span><span class="n">u8</span><span class="p">{}</span> <span class="k">while</span> <span class="n">true</span><span class="p">:</span> <span class="n">err</span><span class="p">,</span> <span class="n">n_read</span> <span class="o">=</span> <span class="k">await</span> <span class="n">f</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="o">&amp;</span><span class="n">read_buffer</span><span class="p">)</span> <span class="k">if</span> <span class="n">err</span> <span class="o">==</span> <span class="n">io</span><span class="o">.</span><span class="n">EOF</span><span class="p">:</span> <span class="n">into_buffer</span><span class="o">.</span><span class="n">copy_maybe_allocate</span><span class="p">(</span><span class="n">read_buffer</span><span class="p">[</span><span class="mi">0</span><span class="p">:</span><span class="n">sizeof</span><span class="p">(</span><span class="n">read_buffer</span><span class="p">)])</span> <span class="k">return</span> <span class="k">if</span> <span class="n">err</span><span class="p">:</span> <span class="n">throw</span> <span class="n">err</span> <span class="n">into_buffer</span><span class="o">.</span><span class="n">copy_maybe_allocate</span><span class="p">(</span><span class="n">read_buffer</span><span class="p">[</span><span class="mi">0</span><span class="p">:</span><span class="n">sizeof</span><span class="p">(</span><span class="n">read_buffer</span><span class="p">)])</span> </pre></div> <p>In our simulator, we will provide a mocked out IO system and we will randomly inject various errors while asserting pre- and post-conditions.</p> <div class="highlight"><pre><span></span><span class="c1"># sim.psuedocode</span> <span class="kn">import</span> <span class="s2">&quot;readfile.pseudocode&quot;</span> <span class="n">seed</span> <span class="o">=</span> <span class="k">if</span> <span class="n">os</span><span class="o">.</span><span class="n">env</span><span class="o">.</span><span class="n">DST_SEED</span> <span class="err">?</span> <span class="nb">int</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">env</span><span class="o">.</span><span class="n">DST_SEED</span><span class="p">)</span> <span class="p">:</span> <span class="n">time</span><span class="o">.</span><span class="n">now</span><span class="p">()</span> <span class="n">rnd</span> <span class="o">=</span> <span class="n">rnd</span><span class="o">.</span><span class="n">new</span><span class="p">(</span><span class="n">seed</span><span class="p">)</span> <span class="k">while</span> <span class="n">true</span><span class="p">:</span> <span class="n">sim_disk_data</span> <span class="o">=</span> <span class="n">rnd</span><span class="o">.</span><span class="n">rand_bytes</span><span class="p">(</span><span class="mi">10</span><span class="n">MB</span><span class="p">)</span> <span class="n">sim_fd</span> <span class="o">=</span> <span class="p">{</span> <span class="n">pos</span><span class="p">:</span> <span class="mi">0</span> <span class="n">EOF</span><span class="p">:</span> <span class="n">Error</span><span class="p">(</span><span class="s2">&quot;eof&quot;</span><span class="p">)</span> <span class="n">read</span><span class="p">:</span> <span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="n">buf</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span> <span class="n">partial_read</span> <span class="o">=</span> <span class="n">rnd</span><span class="o">.</span><span class="n">rand_in_range_inclusive</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">sizeof</span><span class="p">(</span><span class="n">buf</span><span class="p">))</span> <span class="n">memcpy</span><span class="p">(</span><span class="n">sim_disk_data</span><span class="p">,</span> <span class="n">buf</span><span class="p">,</span> <span class="n">fd</span><span class="o">.</span><span class="n">pos</span><span class="p">,</span> <span class="n">partial_read</span><span class="p">)</span> <span class="n">fd</span><span class="o">.</span><span class="n">pos</span> <span class="o">+=</span> <span class="n">partial_read</span> <span class="k">if</span> <span class="n">fd</span><span class="o">.</span><span class="n">pos</span> <span class="o">==</span> <span class="n">sizeof</span><span class="p">(</span><span class="n">sim_disk_data</span><span class="p">):</span> <span class="k">return</span> <span class="n">io</span><span class="o">.</span><span class="n">EOF</span><span class="p">,</span> <span class="n">partial_read</span> <span class="k">return</span> <span class="n">partial_read</span> <span class="p">}</span> <span class="p">}</span> <span class="n">sim_io</span> <span class="o">=</span> <span class="p">{</span> <span class="nb">open</span><span class="p">:</span> <span class="p">(</span><span class="n">filename</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="n">sim_fd</span> <span class="p">}</span> <span class="n">out_buf</span> <span class="o">=</span> <span class="n">Vector</span><span class="o">&lt;</span><span class="n">u8</span><span class="o">&gt;.</span><span class="n">new</span><span class="p">()</span> <span class="k">try</span><span class="p">:</span> <span class="n">read_file</span><span class="p">(</span><span class="n">sim_io</span><span class="p">,</span> <span class="s2">&quot;somefile&quot;</span><span class="p">,</span> <span class="n">out_buf</span><span class="p">)</span> <span class="n">assert_bytes_equal</span><span class="p">(</span><span class="n">out_buf</span><span class="o">.</span><span class="n">data</span><span class="p">,</span> <span class="n">sim_disk_data</span><span class="p">)</span> <span class="n">catch</span> <span class="p">(</span><span class="n">e</span><span class="p">):</span> <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;Found logical error with seed: </span><span class="si">%d</span><span class="s2">&quot;</span><span class="p">,</span> <span class="n">seed</span><span class="p">)</span> <span class="n">throw</span> <span class="n">e</span> </pre></div> <p>And with this simulator we would have eventually caught our partial read bug! In our original program when we wrote:</p> <div class="highlight"><pre><span></span> <span class="n">into_buffer</span><span class="o">.</span><span class="n">copy_maybe_allocate</span><span class="p">(</span><span class="n">read_buffer</span><span class="p">[</span><span class="mi">0</span><span class="p">:</span><span class="n">sizeof</span><span class="p">(</span><span class="n">read_buffer</span><span class="p">)])</span> </pre></div> <p>We should have written:</p> <div class="highlight"><pre><span></span> <span class="n">into_buffer</span><span class="o">.</span><span class="n">copy_maybe_allocate</span><span class="p">(</span><span class="n">read_buffer</span><span class="p">[</span><span class="mi">0</span><span class="p">:</span><span class="n">n_read</span><span class="p">])</span> </pre></div> <p>Great! Let's get a little more complex.</p> <h3 id="a-distributed-system">A distributed system</h3><p>I already mentioned in the beginning that the gist of deterministic simulation testing a distributed system is that you get all of the nodes in the system to run in the same process. This would be basically impossible if you wanted to test a system that involved your application plus Kafka plus Postgres plus Redis. But if your system is a self-contained distributed system, such as one that embeds a Raft library for high availability of your application, you can actually run multiple nodes into the same process!</p> <p>For a system like this, our simulator might look like:</p> <div class="highlight"><pre><span></span><span class="c1"># sim.pseudocode</span> <span class="kn">import</span> <span class="s2">&quot;distsys-node.pseudocode&quot;</span> <span class="n">seed</span> <span class="o">=</span> <span class="k">if</span> <span class="n">os</span><span class="o">.</span><span class="n">env</span><span class="o">.</span><span class="n">DST_SEED</span> <span class="err">?</span> <span class="nb">int</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">env</span><span class="o">.</span><span class="n">DST_SEED</span><span class="p">)</span> <span class="p">:</span> <span class="n">time</span><span class="o">.</span><span class="n">now</span><span class="p">()</span> <span class="n">rnd</span> <span class="o">=</span> <span class="n">rnd</span><span class="o">.</span><span class="n">new</span><span class="p">(</span><span class="n">seed</span><span class="p">)</span> <span class="k">while</span> <span class="n">true</span><span class="p">:</span> <span class="n">sim_fd</span> <span class="o">=</span> <span class="p">{</span> <span class="n">send</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="n">buf</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span> <span class="c1"># Inject random failure.</span> <span class="k">if</span> <span class="n">rnd</span><span class="o">.</span><span class="n">rand</span><span class="p">()</span> <span class="o">&gt;</span> <span class="mf">.5</span><span class="p">:</span> <span class="n">throw</span> <span class="n">Error</span><span class="p">(</span><span class="s1">&#39;bad write&#39;</span><span class="p">)</span> <span class="c1"># Inject random latency.</span> <span class="k">if</span> <span class="n">rnd</span><span class="o">.</span><span class="n">rand</span><span class="p">()</span> <span class="o">&gt;</span> <span class="mf">.5</span><span class="p">:</span> <span class="k">await</span> <span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="n">rnd</span><span class="o">.</span><span class="n">rand</span><span class="p">())</span> <span class="n">n_written</span> <span class="o">=</span> <span class="n">assert_ok</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">fd</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">buf</span><span class="p">))</span> <span class="k">return</span> <span class="n">n_written</span> <span class="p">},</span> <span class="n">recv</span><span class="p">(</span><span class="n">fd</span><span class="p">,</span> <span class="n">buf</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span> <span class="c1"># Inject random failure.</span> <span class="k">if</span> <span class="n">rnd</span><span class="o">.</span><span class="n">rand</span><span class="p">()</span> <span class="o">&gt;</span> <span class="mf">.5</span><span class="p">:</span> <span class="n">throw</span> <span class="n">Error</span><span class="p">(</span><span class="s1">&#39;bad read&#39;</span><span class="p">)</span> <span class="c1"># Inject random latency.</span> <span class="k">if</span> <span class="n">rnd</span><span class="o">.</span><span class="n">rand</span><span class="p">()</span> <span class="o">&gt;</span> <span class="mf">.5</span><span class="p">:</span> <span class="k">await</span> <span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="n">rnd</span><span class="o">.</span><span class="n">rand</span><span class="p">())</span> <span class="k">return</span> <span class="n">os</span><span class="o">.</span><span class="n">fd</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="n">buf</span><span class="p">)</span> <span class="p">}</span> <span class="p">}</span> <span class="n">sim_io</span> <span class="o">=</span> <span class="p">{</span> <span class="nb">open</span><span class="p">:</span> <span class="p">(</span><span class="n">filename</span><span class="p">)</span> <span class="o">=&gt;</span> <span class="p">{</span> <span class="c1"># Inject random failure.</span> <span class="k">if</span> <span class="n">rnd</span><span class="o">.</span><span class="n">rand</span><span class="p">()</span> <span class="o">&gt;</span> <span class="mf">.5</span><span class="p">:</span> <span class="n">throw</span> <span class="n">Error</span><span class="p">(</span><span class="s1">&#39;bad open&#39;</span><span class="p">)</span> <span class="c1"># Inject random latency.</span> <span class="k">if</span> <span class="n">rnd</span><span class="o">.</span><span class="n">rand</span><span class="p">()</span> <span class="o">&gt;</span> <span class="mf">.5</span><span class="p">:</span> <span class="k">await</span> <span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="n">rnd</span><span class="o">.</span><span class="n">rand</span><span class="p">())</span> <span class="k">return</span> <span class="n">sim_fd</span> <span class="p">}</span> <span class="p">}</span> <span class="n">all_ports</span> <span class="o">=</span> <span class="p">[</span><span class="mi">6000</span><span class="p">,</span> <span class="mi">6001</span><span class="p">,</span> <span class="mi">6002</span><span class="p">]</span> <span class="n">nodes</span> <span class="o">=</span> <span class="p">[</span> <span class="k">await</span> <span class="n">distsys</span><span class="o">-</span><span class="n">node</span><span class="o">.</span><span class="n">start</span><span class="p">(</span><span class="n">sim_io</span><span class="p">,</span> <span class="n">all_ports</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">all_ports</span><span class="p">),</span> <span class="k">await</span> <span class="n">distsys</span><span class="o">-</span><span class="n">node</span><span class="o">.</span><span class="n">start</span><span class="p">(</span><span class="n">sim_io</span><span class="p">,</span> <span class="n">all_ports</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">all_ports</span><span class="p">),</span> <span class="k">await</span> <span class="n">distsys</span><span class="o">-</span><span class="n">node</span><span class="o">.</span><span class="n">start</span><span class="p">(</span><span class="n">sim_io</span><span class="p">,</span> <span class="n">all_ports</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span> <span class="n">all_ports</span><span class="p">),</span> <span class="p">]</span> <span class="n">history</span> <span class="o">=</span> <span class="p">[]</span> <span class="k">try</span><span class="p">:</span> <span class="n">key</span> <span class="o">=</span> <span class="n">rnd</span><span class="o">.</span><span class="n">rand_bytes</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span> <span class="n">value</span> <span class="o">=</span> <span class="n">rnd</span><span class="o">.</span><span class="n">rand_bytes</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span> <span class="n">nodes</span><span class="p">[</span><span class="n">rnd</span><span class="o">.</span><span class="n">rand_in_range_inclusive</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="nb">len</span><span class="p">(</span><span class="n">nodes</span><span class="p">)]</span><span class="o">.</span><span class="n">insert</span><span class="p">(</span><span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="p">)</span> <span class="n">history</span><span class="o">.</span><span class="n">add</span><span class="p">((</span><span class="n">key</span><span class="p">,</span> <span class="n">value</span><span class="p">))</span> <span class="n">assert_valid_history</span><span class="p">(</span><span class="n">nodes</span><span class="p">,</span> <span class="n">history</span><span class="p">)</span> <span class="c1"># Crash a process every so often</span> <span class="k">if</span> <span class="n">rnd</span><span class="o">.</span><span class="n">rand</span><span class="p">()</span> <span class="o">&gt;</span> <span class="mf">0.75</span><span class="p">:</span> <span class="n">node</span> <span class="o">=</span> <span class="n">nodes</span><span class="p">[</span><span class="n">rnd</span><span class="o">.</span><span class="n">rand_in_range_inclusive</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">3</span><span class="p">)]</span> <span class="n">node</span><span class="o">.</span><span class="n">restart</span><span class="p">()</span> <span class="n">catch</span> <span class="p">(</span><span class="n">e</span><span class="p">):</span> <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;Found logical error with seed: </span><span class="si">%d</span><span class="s2">&quot;</span><span class="p">,</span> <span class="n">seed</span><span class="p">)</span> <span class="n">throw</span> <span class="n">e</span> </pre></div> <p>I'm completely hand waving here to demonstrate the broader point and not any specific testing strategy for a specific distributed system. The important points are that these three nodes run in the same process, on different ports.</p> <p>We control disk IO. We control network IO. We control how time elapses. We run a deterministic simulated workload against the three node system while injecting disk, network, and process faults.</p> <p>And we are constantly checking for an invalid state. When we get the invalid state, we can be sure the user can easily recreate this invalid state.</p> <h3 id="other-sources-of-non-determinism">Other sources of non-determinism</h3><p>Within some error margin, most CPU instructions and most CPU behavior are considered to be deterministic. There are, however, certain CPU instructions that are <a href="https://cs.stackexchange.com/questions/132842/under-which-conditions-a-given-program-is-deterministic-on-x86-64-machines/132856#132856">definitely not</a>. Unfortunately that might <a href="https://github.com/facebookexperimental/hermit/issues/34">include</a> system calls. It might also <a href="https://stackoverflow.com/a/8171032">include</a> malloc. There is very little to trust.</p> <p>If we <a href="https://antithesis.com/blog/deterministic_hypervisor/">ignore</a> Antithesis, people doing DST seem not to worry about these smaller bits of nondeterminism. Yet it's generally agreed that DST is still worthwhile anyway. The intuition here is that every bit of non-determinism you can eliminate makes it that much easier to reproduce bugs when you find them.</p> <p>Put another way: determinism, even among DST practitioners, remains a spectrum.</p> <h3 id="considerations">Considerations</h3><p>As you may have noticed already from some of the pseudocode, DST is not a panacea.</p> <h4 id="consideration-1:-edges">Consideration 1: Edges</h4><p>First, because you must swap out non-deterministic parts of your code, you are not actually testing the entirety of your code. You are certainly encouraged to keep the deterministic kernel large. But there will always be the non-deterministic edges.</p> <p>Without a system like Antithesis which gives you an entire deterministic machine, you can't test your whole program.</p> <p>But even with Antithesis you cannot test the <i>integration</i> between your system and external systems. You must mock out the external systems.</p> <p>It's also worth noting that there are many areas where you could inject simulation. You could do it at a high-level RPC and storage layer. This would be simpler and easier to understand. But then you'd be omitting testing and error-handling of lower-level errors.</p> <h4 id="consideration-2:-your-workload(s)">Consideration 2: Your workload(s)</h4><p>DST is dependent on your creativity and thoroughness of your workload as much as any other type of test or benchmark.</p> <p>Just as you wouldn't depend on one single benchmark to qualify your application, you may not want to depend on a single simulated workload.</p> <p>Or as Will Wilson put it for me:</p> <blockquote><p>The biggest challenge of DST in my experience is that tuning all the random distributions, the parameters of your system, the workload, the fault injection, etc. so that it produces interesting behavior is very challenging and very labor intensive. As with fuzzing or PBT, it's terrifyingly easy to build a DST system that appears to be doing a ton of testing, but actually never explores very much of the state space of your system. At FoundationDB, the vast majority of the work we put into the simulator was an iterative process of hunting for what wasn't being covered by our tests and then figuring out how to make the tests better. This process often resembles science more than it does engineering.</p> <p>Unfortunately, unlike with fuzzing, mere branch coverage in your code is usually a pretty poor signal for the kinds of systems you want to test with DST. At Antithesis we handle this with <a href="https://antithesis.com/docs/best_practices/sometimes_assertions.html">Sometimes assertions</a>, at FDB we did something pretty similar, and I assume TigerBeetle and others have their own version of this. But of course the ultimate figure of merit is whether your DST system is finding 100% of your bugs. It's quite difficult to get to the point that it does. The truly ambitious part of Antithesis isn't the hypervisor, but the fact that we also aim to solve the much harder "is my DST working?" problem with minimal human guidance or supervision.</p> </blockquote> <h4 id="consideration-3:-your-knowledge-of-what-you-mocked">Consideration 3: Your knowledge of what you mocked</h4><p>When you mock out the behavior of disk or network IO, the benefits of DST are tied to your understanding of the spectrum of behavior that may happen in the real world.</p> <p>What are all possible error conditions? What are the extreme latency bounds of the original method? What about corruption or misdirected IO?</p> <p>The flipside here is that only in deterministic simulation testing can you configure these crazy scenarios to happen at a <i>configurable regularity</i>. You can kick off a set of runs that have especially high IO latency or especially high corrupt reads/writes. Joran and I <a href="https://tigerbeetle.com/blog/2023-07-11-we-put-a-distributed-database-in-the-browser">wrote</a> a year ago about how the TigerBeetle simulator does exactly this.</p> <h4 id="consideration-4:-non-reproducible-seeds-as-code-changes">Consideration 4: Non-reproducible seeds as code changes</h4><p>Critically, the reproducibility of DST only helps so long as your <i>code doesn't change</i>. As soon as your code changes, the seed may no longer even get you to the state where the bug was exhibited. So the reproducibility of DST means more that it may help you convert the seed simulation run into an integration test that describes the precise scenario even as the code changes.</p> <h4 id="consideration-5:-time-and-compute">Consideration 5: Time and compute</h4><p>Because of Consideration 4, you need to keep rerunning the simulator not just to keep finding new seeds and new histories but because the new seeds and new histories may change every time you make changes to code.</p> <h3 id="what-about-jepsen?">What about Jepsen?</h3><p>Jepsen does limited process and network fault injection while testing for linearizability. It's a fantastic project.</p> <p>However, it represents only a subset of what is possible with Deterministic Simulation Testing (if you actually put in the effort described above to get there).</p> <p>But even more importantly, Jepsen has nothing to do with deterministic execution. If Jepsen finds a bug and your system can't do deterministic execution, you may or may not be able to reproduce that Jepsen bug.</p> <p>Here's another Will Wilson <a href="https://antithesis.com/blog/is_something_bugging_you/">quote</a> for you on Jepsen and FoundationDB:</p> <blockquote><p>Anyway, we did [Deterministic Simulation Testing] for a while and found all of the bugs in the database. I know, I know, that’s an insane thing to say. It’s kind of true though. In the entire history of the company, I think we only ever had one or two bugs reported by a customer. Ever. Kyle Kingsbury aka “aphyr” didn’t even bother testing it with Jepsen, because he didn’t think he’d find anything.</p> </blockquote> <h3 id="conclusion">Conclusion</h3><p>The degree to which you can place faith in DST alone, and not time spent in production, has limits. However, it certainly does no harm to employ DST. And, barring the considerations described above, will likely make the kernel of your product significantly more stable. Furthermore, everyone who uses DST knows about these considerations. But I think it's worthwhile to list them out to help folks who do not know DST to build an intuition for what it's excellent at.</p> <p>Further reading:</p> <ul> <li><a href="https://www.youtube.com/watch?v=4fFDFbi3toc">"Testing Distributed Systems w/ Deterministic Simulation" by Will Wilson</a></li> <li><a href="https://www.polarsignals.com/blog/posts/2024/05/28/mostly-dst-in-go">(Mostly) Deterministic Simulation Testing in Go</a></li> <li><a href="https://github.com/madsim-rs/madsim">Magical Deterministic Simulator for distributed systems in Rust</a></li> </ul> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I wrote a new post talking through the basics, considerations, and limitations of Deterministic Simulation Testing.<a href="https://t.co/9Fp5ytL7Wz">https://t.co/9Fp5ytL7Wz</a> <a href="https://t.co/xRE6FOwc0P">pic.twitter.com/xRE6FOwc0P</a></p>&mdash; Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1825851204632445377?ref_src=twsrc%5Etfw">August 20, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/2024-08-20-deterministic-simulation-testing.htmlTue, 20 Aug 2024 00:00:00 +0000Delightful, production-grade replication for Postgreshttp://notes.eatonphil.com/2024-07-30-delightful-production-grade-replication-postgres.html<head> <meta http-equiv="refresh" content="4;URL='https://www.enterprisedb.com/blog/delightful-production-grade-replication-postgres'" /> </head><p>This is an external post of mine. Click <a href="https://www.enterprisedb.com/blog/delightful-production-grade-replication-postgres">here</a> if you are not redirected.</p> http://notes.eatonphil.com/2024-07-30-delightful-production-grade-replication-postgres.htmlTue, 30 Jul 2024 00:00:00 +0000A reawakening of systems programming meetupshttp://notes.eatonphil.com/2024-07-07-systems-meetups.html<p>This year has seen a resurgence in really high quality systems programming meetups. <a href="https://www.meetup.com/munich-database-meetup/">Munich Database Meetup</a>, <a href="https://lu.ma/8ujc7st3?tk=DAAbmn">Berlin Systems Group</a>, <a href="https://lu.ma/t6r4mi4v">SF Distributed Systems Meetup</a>, <a href="https://nycsystems.xyz/">NYC Systems</a>, <a href="https://twitter.com/BengaluruSys">Bengaluru Systems</a>, to name a few.</p> <p>This post summarizes a bit of disappointing recent tech meetup history, the new trend of excellent systems programming meetups, and ends with some encouragement and guidance for running your own systems programming events.</p> <p>I will be a little critical in this post but I want to preface by saying: organizing meetups is really tough! It takes a lot of work and I have a huge amount of respect for meetup organizers even when their meetup style did not resonate with me.</p> <p>Although much of this post talks about NYC Systems, the reason I think this post is worth writing is because so many other meetups in a similar vein popped up. I hope to encourage these other meetups and to encourage folks in other major metros (London, for example) to start similar meetups.</p> <h3 id="meetups">Meetups</h3><p>I used to attend a bunch of meetups before the pandemic. But I quickly got disillusioned. Almost every meetup was varying degrees of startups pitching their product. The last straw for me was sitting through a talk at a JavaScript meetup that was by a devrel employee of a startup who literally gave a tutorial for their product.</p> <p>There were also some pretty intelligent meetups like the New York Haskell Users Group and the New York Emacs Meetup. But not being an expert in either domain, and the attendees almost solely appearing to be experts, I didn't particularly enjoy going.</p> <p>There were a couple of meetups that felt inclusive for various skill-levels of attendees yet still went into interesting depth. Specifically, <a href="http://www.nylug.org/">New York Linux User Group</a> and <a href="https://paperswelove.org/chapter/newyork/">Papers We Love NYC</a>.</p> <p>These meetups were exceptional because they were language- and framework-agnostic, they would start broad to give you background, but then go deep into a topic. Maybe you only understood 50% of what was covered. But you get exposed to something new from an expert in that domain.</p> <p>Unfortunately, the pandemic happened and these two excellent meetups basically have not come back.</p> <h3 id="a-couple-of-students-in-munich">A couple of students in Munich</h3><p>The pandemic ended and I tried a couple of meetups I thought might be better quality. Rust and Go. But they weren't much better than I remembered. People would give a high level talk and brush over all the interesting concepts.</p> <p>I had been thinking of doing an in-person talk series since 2022.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">If I put together a systems/databases/distributed systems meetup in NYC (a physical meetup, not Zoom), who&#39;d be interested (in attending, or presenting, or helping me organize, or donating space)?<br><br>No promises!</p>&mdash; Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1574875016067710976?ref_src=twsrc%5Etfw">September 27, 2022</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> <p>But I was busy with TigerBeetle until December of 2023 when I was messaged on LinkedIn by <a href="https://x.com/georg_kreuzmayr?lang=en">Georg Kreuzmayr</a>, a graduate student at Technical University of Munich (TUM).</p> <p>Georg and his friends, fellow graduate students at TUM, started a database club: <a href="https://www.tumuchdata.club/">TUMuchData</a>. We got to talking about opportunities for collaboration and I started feeling a bit embarrassed that a graduate student had more guts than I had to get <a href="https://notes.eatonphil.com/eight-years-of-tech-meetups.html">back</a> onto the meetup organizer wagon.</p> <p>A week later, with assurance from <a href="https://twitter.com/justinjaffray">Justin Jaffray</a> that at least he would show up with me if no one else did, I started the <a href="https://eatonphil.com/nyc-systems-coffee-club.html">NYC Systems Coffee Club</a> to bring together folks in NYC interested in any topic of systems programming (e.g. compilers, databases, web browser internals, distributed systems, formal methods, etc.). To bring them together in a completely informal setting for coffee at 9am in the morning in a public space in midtown Manhattan.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Trying something new! If you&#39;re a dev in NYC working <br>on (or interested in) systems programming, grab a coffee and come hang out at 1 Bryant Park (indoor space) this Thursday 9AM - 9:30AM.<br><br>See post for details and fill out the Google Form or DM me!<a href="https://t.co/A4bzcPGy6x">https://t.co/A4bzcPGy6x</a> <a href="https://t.co/n1ECMd59ev">pic.twitter.com/n1ECMd59ev</a></p>&mdash; Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1734216183459512486?ref_src=twsrc%5Etfw">December 11, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> <p>I set up that linked web page and started collecting subscribers to the club via Google Form. Once a month I'd send an email out to the list asking for RSVPs to this month's coffee club. The first 20 to respond would get a calendar invite.</p> <p><img src="/assets/coffee-club-invite.png" alt="/assets/coffee-club-invite.png"></p> <p>And about the same time I started asking around on Twitter/LinkedIn if someone would be interested in co-organizing a new systems programming meetup in NYC. <a href="https://twitter.com/ngeloxyz">Angelo Saraceno</a> immediately took me up on the idea and we met up.</p> <h3 id="nyc-systems">NYC Systems</h3><p>We agreed on the premise: this would be a language- and framework-agnostic meetup that was focused on engineering challenges, not product pitches. It would be 100% for the sake of corporate marketing, but corporate marketing of the <em>engineering team</em>, not the product.</p> <p><a href="https://nycsystems.xyz/">NYC Systems</a> was born!</p> <p>We'd find speakers who could start broad and dive deep into some interesting aspect of databases, programming languages, distributed systems, and so on. Product pitches were necessary to establish a context, but the focus of the talk would be about some interesting recent technical challenge and how they dealt with it.</p> <p>We'd schedule talks only every other month to ease our own burden in organizing and finding great speakers.</p> <p>Once Angelo and I had decided to go forward, the next two challenges were finding speakers and finding a venue. Thanks to Twitter and LinkedIn, finding speakers turned out to be the easy part.</p> <p>It was harder to find a venue. It was surprisingly challenging to find a company in NYC with a shared vision that the important thing about being associated with a meetup like this is to be associated with the quality of speakers and audience we can bring in by not allowing transparent product pitches.</p> <p>Almost every company in Manhattan with space we spoke with had a requirement that they have their own speaker each night. That seemed like a bad idea.</p> <p>I think it was especially challenging to find a company willing to relax about branding requirements like this because we were a new meetup.</p> <p>It was pretty frustrating not to find a sympathetic company with space in Manhattan. And the only reason we didn't give up was because Angelo was so adament that this kind of meetup actually happen. It's always best to start something new with someone else for this exact reason. You can keep each other going.</p> <p>In the end we went with the company that did not insist on their own speaker or their own branding. A Brooklyn-based company whose CEO immediately got in touch with me that they wanted to host us, <a href="https://trailofbits.com/">Trail of Bits</a>.</p> <h3 id="how-it-works">How it works</h3><p>To keep things easy, I set up a web page on my personal site with information about the meetup. (Eventually we moved this to <a href="https://nycsystems.xyz/">nycsystems.xyz</a>.) I set up a Google Form to collect emails for a mailing list. And we started posting about the group on Twitter and LinkedIn.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Very pleased to share the first NYC Systems Talks are taking place next Thursday Feb 22nd 6PM. Hosted by <a href="https://twitter.com/trailofbits?ref_src=twsrc%5Etfw">@trailofbits</a>, with <a href="https://twitter.com/paulgb?ref_src=twsrc%5Etfw">@paulgb</a> and <a href="https://twitter.com/StefanKarpinski?ref_src=twsrc%5Etfw">@StefanKarpinski</a> speaking.<br><br>Space is not infinite, fill out the Google Form if you can attend and would like an invite!<a href="https://t.co/jNssr5v1kJ">https://t.co/jNssr5v1kJ</a></p>&mdash; Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1758249063550447768?ref_src=twsrc%5Etfw">February 15, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> <p>We published the event calendar in advance (an HTML table on the website) and announced each event's speakers a week in advance of the event. I'd send another Google Form to the mailing list taking RSVPs for the night. The first 60 people to respond got a Google Calendar invite.</p> <p><img src="/assets/nyc-systems.png" alt="/assets/nyc-systems.png"></p> <p>It's a bit of work, sure, but I'd do anything to avoid Meetup.com.</p> <p class="note"> It is interesting to see every new systems programming meetup also not pick Meetup.com. The only one that went with it, Munich Database Meetup, is a revival of an existing group, the Munich NoSQL Meetup and presumably they didn't want to give up their subscribers. Though most others use lu.ma. </p><p>The mailing list is now about 400+ people. And in each event RSVP we have a wait list of 20-30 people. Of course although 60 people say Yes initially, by the time of the event we have typically gotten about 50 people in attendance.</p> <p>At each event, Trail of Bits provided screens, chairs, food, and drink. Angelo had recording equipment so he took over audio/video capturing (and later editing and publishing).</p> <p>After each event we'd publish talk videos to our <a href="https://www.youtube.com/@NYCSystems">@NYCSystems</a> Youtube.</p> <h3 id="network-effects">Network effects</h3><p>In March 2024, the TUMuchData folks joined <a href="https://x.com/ifesdjeen">Alex Petrov</a>'s Munich NoSQL Meetup to form the Munich Database Meetup. In May, <a href="https://twitter.com/thegeeknarrator">Kaivalya Apte</a> and <a href="https://twitter.com/mgill25">Manish Gill</a> started the Berlin Systems Group, inspired by Alex and the Munich Database Meetup.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I want to start a Berlin Database/Storage systems group, where we have regular meetups, discussions and talks. <br><br>WDYT? <a href="https://twitter.com/mgill25?ref_src=twsrc%5Etfw">@mgill25</a> <a href="https://twitter.com/mehd_io?ref_src=twsrc%5Etfw">@mehd_io</a> <a href="https://twitter.com/ClickHouseDB?ref_src=twsrc%5Etfw">@ClickHouseDB</a> <a href="https://twitter.com/SnowflakeDB?ref_src=twsrc%5Etfw">@SnowflakeDB</a> <a href="https://twitter.com/awscloud?ref_src=twsrc%5Etfw">@awscloud</a> <a href="https://twitter.com/GoogleDE?ref_src=twsrc%5Etfw">@GoogleDE</a> <a href="https://twitter.com/TUBerlin?ref_src=twsrc%5Etfw">@TUBerlin</a> <br><br>Can I get some support? Who else would be interested? <a href="https://twitter.com/hashtag/Databases?src=hash&amp;ref_src=twsrc%5Etfw">#Databases</a> <br><br>Thanks…</p>&mdash; Kaivalya Apte - The Geek Narrator (@thegeeknarrator) <a href="https://twitter.com/thegeeknarrator/status/1790782561515372676?ref_src=twsrc%5Etfw">May 15, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> <p>In May 2024, two PhD students in the San Francisco Bay Area, <a href="https://x.com/ShadajL">Shadaj Laddad</a> and <a href="https://x.com/conor_power23">Conor Power</a>, started the SF Distributed Systems meetup.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">We’re super excited to be organizing a new SF Distributed Systems meetup NEXT WEEK! Our first meetup features <a href="https://twitter.com/julianhyde?ref_src=twsrc%5Etfw">@julianhyde</a> and <a href="https://twitter.com/conor_power23?ref_src=twsrc%5Etfw">@conor_power23</a> presenting work on extending SQL and applying algebraic properties, sign up at <a href="https://t.co/d2lLDaQ5iJ">https://t.co/d2lLDaQ5iJ</a></p>&mdash; Shadaj Laddad (@ShadajL) <a href="https://twitter.com/ShadajL/status/1790767187327889456?ref_src=twsrc%5Etfw">May 15, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> <p>And in July 2024, <a href="https://twitter.com/shraddhaag">Shraddha Agrawal</a>, <a href="https://twitter.com/anirudhRowjee">Anirudh Rowjee</a> and friends kicked off the first Bengaluru Systems Meetup.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Are you ready, Systems Enthusiasts of Bengaluru?<br><br>Speaking at our first-ever meetup on 6th July, we have:<a href="https://twitter.com/simsimsandy?ref_src=twsrc%5Etfw">@simsimsandy</a> with &quot;Learn about the systems that power GenAI applications&quot; and <a href="https://twitter.com/vivekgalatage?ref_src=twsrc%5Etfw">@vivekgalatage</a> with &quot;The Browser Backstage: Performance vs Security&quot; <br>(talks linked below!)</p>&mdash; Bengaluru Systems Meetup (@BengaluruSys) <a href="https://twitter.com/BengaluruSys/status/1808949578307183060?ref_src=twsrc%5Etfw">July 4, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> <h3 id="suggestions">Suggestions</h3><p>First off, don't pay for anything yourself. Find a company who will host. At the same time, don't feel the need to give in too much to the demands of the company. I'd be happy to help you think through how to talk about the event with companies. It is mutually beneficial for them to get to give a 5-minute hiring/product pitch and not need to do extensive branding nor to give a 30-minute product tutorial.</p> <p>Second, keep a bit of pressure on speakers to not do an overview talk and not to do a product pitch. Suggest that they tell the story of some interesting recent bug or interesting recent feature. What happened? Why was it hard? What did you learn?</p> <p>Focusing on these types of talks will help you get a really interesting audience.</p> <p>I have been continuously surprised and impressed at the folks who show up for NYC Systems. It's a mix of technical founders in the systems space, pretty experienced developers in the systems space, graduate students, and developers of all sorts.</p> <p>I am certain we can only get these kinds of folks to show up because we avoid product pitch-type talks.</p> <p>Third, finding speakers is still hard! The best approach so far has been to individually message folks in industry and academia who hang out on Twitter. Sending out a public call is easy but doesn't often pan out. So keep an eye on interesting companies in the area.</p> <p>Another avenue I've been thinking about is messaging VC connections to ask them if they know any engineers/technical founders/CTOs in the area who could give an interesting technical talk.</p> <p>Fourth, speak with other organizers! I finally met Alex Petrov in person last month and we had a <a href="https://twitter.com/ifesdjeen/status/1806677549038063901">great time</a> talking about the challenges and joys of organizing really high quality meetups.</p> <p>I'm always happy to chat, DMs are open.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">New post telling a bit of the history behind <a href="https://t.co/NEh1tm8v3Q">https://t.co/NEh1tm8v3Q</a>; why it only exists due to folks like <a href="https://twitter.com/georg_kreuzmayr?ref_src=twsrc%5Etfw">@georg_kreuzmayr</a> and <a href="https://twitter.com/ngeloxyz?ref_src=twsrc%5Etfw">@ngeloxyz</a>; the explosion of systems meetups around the world; and encouragement and suggestions for future organizers!<a href="https://t.co/dwe4TtmXKK">https://t.co/dwe4TtmXKK</a> <a href="https://t.co/ZMLkVYdZDJ">pic.twitter.com/ZMLkVYdZDJ</a></p>&mdash; Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1809934997442498812?ref_src=twsrc%5Etfw">July 7, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/2024-07-07-systems-meetups.htmlSun, 07 Jul 2024 00:00:00 +0000A write-ahead log is not a universal part of durabilityhttp://notes.eatonphil.com/2024-07-01-a-write-ahead-log-is-not-a-universal-part-of-durability.html<p>A database does not need a write-ahead log (WAL) to achieve durability. A database can write its long-term data structure durably to disk before returning to a client. Granted, this is a bad idea! And granted, a WAL <b>is</b> critical for durability <b>by design</b> in most databases. But I think it's helpful to understand WALs by understanding what you <b>could</b> do without them.</p> <p>So let's look at what terrible design we can make for a durable database that has no write-ahead log. To motivate the idea of, and build an intuition for, a write-ahead log.</p> <p>Thank you to Alex Miller for reviewing a version of this post.</p> <p>But first, what is durability?</p> <h3 id="durability">Durability</h3><p>Durability happens in the context of a request a client makes to a data system (either an embedded system like SQLite or RocksDB or a standalone system like Postgres). Durability is a spectrum of guarantees the server provides when a client requests to write some data: that either the request succeeds and the data is safely written to disk, or the request fails and the client must retry or decide to do something else.</p> <p>It can be difficult to set an absolute definition for durability since different databases have different concepts of what can go wrong with disks (also called a "storage fault model"), or they have no concept at all.</p> <p>Let's start from the beginning.</p> <h4 id="an-in-memory-database">An in-memory database</h4><p>An in-memory database has no durability at all. Here is pseudo-code for an in-memory database service.</p> <div class="highlight"><pre><span></span><span class="n">db</span> <span class="o">=</span> <span class="n">btree</span><span class="p">()</span> <span class="k">def</span> <span class="nf">handle_write</span><span class="p">(</span><span class="n">req</span><span class="p">):</span> <span class="n">db</span><span class="o">.</span><span class="n">update</span><span class="p">(</span><span class="n">req</span><span class="o">.</span><span class="n">key</span><span class="p">,</span> <span class="n">req</span><span class="o">.</span><span class="n">value</span><span class="p">)</span> <span class="k">return</span> <span class="mi">200</span><span class="p">,</span> <span class="p">{}</span> <span class="k">def</span> <span class="nf">handle_read</span><span class="p">(</span><span class="n">req</span><span class="p">):</span> <span class="n">value</span> <span class="o">=</span> <span class="n">db</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="n">req</span><span class="o">.</span><span class="n">key</span><span class="p">)</span> <span class="k">return</span> <span class="mi">200</span><span class="p">,</span> <span class="p">{</span><span class="s2">&quot;value&quot;</span><span class="p">:</span> <span class="n">value</span><span class="p">}</span> </pre></div> <p>Throughout this post, for the sake of code brevity, imagine that the environment is concurrent and that data races around shared mutable values like <code>db</code> are protected somehow.</p> <h4 id="writing-to-disk">Writing to disk</h4><p>If we want to achieve the most basic level of durability, we can write this database to a file.</p> <div class="highlight"><pre><span></span><span class="n">f</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="s2">&quot;kv.db&quot;</span><span class="p">)</span> <span class="n">db</span> <span class="o">=</span> <span class="n">btree</span><span class="o">.</span><span class="n">init_from_disk</span><span class="p">(</span><span class="n">f</span><span class="p">)</span> <span class="k">def</span> <span class="nf">handle_write</span><span class="p">(</span><span class="n">req</span><span class="p">):</span> <span class="n">db</span><span class="o">.</span><span class="n">update</span><span class="p">(</span><span class="n">req</span><span class="o">.</span><span class="n">key</span><span class="p">,</span> <span class="n">req</span><span class="o">.</span><span class="n">value</span><span class="p">)</span> <span class="n">db</span><span class="o">.</span><span class="n">write_to_disk</span><span class="p">(</span><span class="n">f</span><span class="p">)</span> <span class="k">return</span> <span class="mi">200</span><span class="p">,</span> <span class="p">{}</span> <span class="k">def</span> <span class="nf">handle_read</span><span class="p">(</span><span class="n">req</span><span class="p">):</span> <span class="n">value</span> <span class="o">=</span> <span class="n">db</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="n">req</span><span class="o">.</span><span class="n">key</span><span class="p">)</span> <span class="k">return</span> <span class="mi">200</span><span class="p">,</span> <span class="p">{</span><span class="s2">&quot;value&quot;</span><span class="p">:</span> <span class="n">value</span><span class="p">}</span> </pre></div> <p><code>btree.write_to_disk</code> will call <a href="https://linux.die.net/man/2/pwrite">pwrite(2)</a> under the hood. And we'll assume it does copy-on-write for only changed pages. So imagine we have a large database represented by a btree that takes up 10GiB on disk. With the btree algorithm, if we write a single entry to the btree, often only a single (often 4Kib) page will get written rather than all pages (holding all values) in the tree. At the same time, in the worst case, the entire tree (all 10GiB of data) may need to get rewritten.</p> <p>But this code isn't crash-safe. If the virtual or physical machine this code is running on reboots, the data we wrote to the file may not actually be on disk.</p> <h4 id="fsync">fsync</h4><p>File data is buffered by the operating system by default. By general consensus, writing data without flushing the operating system buffer is not considered durable. Every so often a new database will show up on Hacker News claiming to beat all other databases on insert speed until a commenter points out the new database doesn't actually flush data to disk.</p> <p>In other words, the commonly accepted requirement for durability is that not only do you write data to a file on disk but you <a href="https://man7.org/linux/man-pages/man2/fsync.2.html">fsync(2)</a> the file you wrote. This forces the operating system to flush to disk any data it has buffered.</p> <div class="highlight"><pre><span></span><span class="n">f</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="s2">&quot;kv.db&quot;</span><span class="p">)</span> <span class="n">db</span> <span class="o">=</span> <span class="n">btree</span><span class="o">.</span><span class="n">init_from_disk</span><span class="p">(</span><span class="n">f</span><span class="p">)</span> <span class="k">def</span> <span class="nf">handle_write</span><span class="p">(</span><span class="n">req</span><span class="p">):</span> <span class="n">db</span><span class="o">.</span><span class="n">update</span><span class="p">(</span><span class="n">req</span><span class="o">.</span><span class="n">key</span><span class="p">,</span> <span class="n">req</span><span class="o">.</span><span class="n">value</span><span class="p">)</span> <span class="n">db</span><span class="o">.</span><span class="n">write_to_disk</span><span class="p">(</span><span class="n">f</span><span class="p">)</span> <span class="n">f</span><span class="o">.</span><span class="n">fsync</span><span class="p">()</span> <span class="c1"># Force a flush</span> <span class="k">return</span> <span class="mi">200</span><span class="p">,</span> <span class="p">{}</span> <span class="k">def</span> <span class="nf">handle_read</span><span class="p">(</span><span class="n">req</span><span class="p">):</span> <span class="n">value</span> <span class="o">=</span> <span class="n">db</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="n">req</span><span class="o">.</span><span class="n">key</span><span class="p">)</span> <span class="k">return</span> <span class="mi">200</span><span class="p">,</span> <span class="p">{</span><span class="s2">&quot;value&quot;</span><span class="p">:</span> <span class="n">value</span><span class="p">}</span> </pre></div> <p>Furthermore you must not ignore fsync failure. How you deal with fsync failure is up to you, but exiting immediately with a message that the user should restore from a backup is sometimes considered acceptable.</p> <p>Databases don't like to fsync because it's slow. Many major databases offer modes where they do not fsync data files before returning a success to a client. Postgres <a href="https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-FSYNC">offers</a> this unsafe mode, though does not default to it and warns against it. MongoDB offers this unsafe mode but <a href="https://www.mongodb.com/docs/manual/core/journaling/#journaling-process">does not default</a> to it.</p> <p class="note"> An earlier version of this post said that MongoDB would unsafely flush on an interval. Daniel Gomez Ferro from MongoDB messaged me that while the docs are confusing, the default write concern "majority" does actually imply "j: true" which means data is synchronized (i.e. fsync-ed) before returning a success to a client. </p><p>Almost every database trades safety for performance in some regard. For example, few databases but SQLite and Cockroach default to Serializable Isolation. While it is commonly agreed that basically no level below Serializable Isolation (that all other databases default to) can be reasoned about. Other databases offer Serializable Isolation, they just don't default to it. Because it can be slow.</p> <h4 id="group-commit">Group commit</h4><p>But let's get back to fsync. One way to amortize the cost of fsync is to delay requests so that you write data from each of them and then fsync the data from all requests. This is sometimes called group commit.</p> <p>For example, we could update the database in-memory but have a background thread serialize to disk and call fsync only every 5ms.</p> <div class="highlight"><pre><span></span><span class="n">f</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="s2">&quot;kv.db&quot;</span><span class="p">)</span> <span class="n">db</span> <span class="o">=</span> <span class="n">btree</span><span class="o">.</span><span class="n">init_from_disk</span><span class="p">(</span><span class="n">f</span><span class="p">)</span> <span class="n">group_commit_sems</span> <span class="o">=</span> <span class="p">[]</span> <span class="nd">@background_worker</span><span class="p">()</span> <span class="k">def</span> <span class="nf">group_commit</span><span class="p">():</span> <span class="k">for</span><span class="p">:</span> <span class="k">if</span> <span class="n">clock</span><span class="p">()</span> <span class="o">%</span> <span class="mi">5</span><span class="n">ms</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span> <span class="n">db</span><span class="o">.</span><span class="n">write_to_disk</span><span class="p">(</span><span class="n">f</span><span class="p">)</span> <span class="n">f</span><span class="o">.</span><span class="n">fsync</span><span class="p">()</span> <span class="c1"># Durably flush for the group</span> <span class="k">for</span> <span class="n">sem</span> <span class="ow">in</span> <span class="n">group_commit_sems</span><span class="p">:</span> <span class="n">sem</span><span class="o">.</span><span class="n">signal</span><span class="p">()</span> <span class="k">def</span> <span class="nf">handle_write</span><span class="p">(</span><span class="n">req</span><span class="p">):</span> <span class="n">db</span><span class="o">.</span><span class="n">update</span><span class="p">(</span><span class="n">req</span><span class="o">.</span><span class="n">key</span><span class="p">,</span> <span class="n">req</span><span class="o">.</span><span class="n">value</span><span class="p">)</span> <span class="n">sem</span> <span class="o">=</span> <span class="n">semaphore</span><span class="p">()</span> <span class="n">group_commit_sems</span><span class="o">.</span><span class="n">push</span><span class="p">(</span><span class="n">sem</span><span class="p">)</span> <span class="n">sem</span><span class="o">.</span><span class="n">wait</span><span class="p">()</span> <span class="k">return</span> <span class="mi">200</span><span class="p">,</span> <span class="p">{}</span> <span class="k">def</span> <span class="nf">handle_read</span><span class="p">(</span><span class="n">req</span><span class="p">):</span> <span class="n">value</span> <span class="o">=</span> <span class="n">db</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="n">req</span><span class="o">.</span><span class="n">key</span><span class="p">)</span> <span class="k">return</span> <span class="mi">200</span><span class="p">,</span> <span class="p">{</span><span class="s2">&quot;value&quot;</span><span class="p">:</span> <span class="n">value</span><span class="p">}</span> </pre></div> <p>It is critical that <code>handle_write</code> waits to return a success until the write is durable via fsync.</p> <p>So to reiterate, the key idea for durability of a client request is that you have some version of the client message stored on disk durably with fsync before returning a success to a client.</p> <p>From now on in this post, when you see "durable" or "durability", it means that the data has been written and fsync-ed to disk.</p> <h3 id="optimizing-durable-writes">Optimizing durable writes</h3><p>A key insight is that it's silly to serialize the entire permanent structure of the database to disk every time a user writes.</p> <p>We could just write the user's message itself to an append-only log. And then only periodically write the entire btree to disk. So long as we have fsync-ed the append-only log file, we can safely return to the user even if the btree itself has not yet been written to disk.</p> <p>The additional logic this requires is that on startup we must read the btree from disk and then replay the log on top of the btree.</p> <div class="highlight"><pre><span></span><span class="n">f</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="s2">&quot;kv.db&quot;</span><span class="p">,</span> <span class="s2">&quot;rw&quot;</span><span class="p">)</span> <span class="n">db</span> <span class="o">=</span> <span class="n">btree</span><span class="o">.</span><span class="n">init_from_disk</span><span class="p">(</span><span class="n">f</span><span class="p">)</span> <span class="n">log_f</span> <span class="o">=</span> <span class="nb">open</span><span class="p">(</span><span class="s2">&quot;kv.log&quot;</span><span class="p">,</span> <span class="s2">&quot;rw&quot;</span><span class="p">)</span> <span class="n">l</span> <span class="o">=</span> <span class="n">log</span><span class="o">.</span><span class="n">init_from_disk</span><span class="p">()</span> <span class="k">for</span> <span class="n">log</span> <span class="ow">in</span> <span class="n">l</span><span class="o">.</span><span class="n">read_logs_from</span><span class="p">(</span><span class="n">db</span><span class="o">.</span><span class="n">last_log_index</span><span class="p">):</span> <span class="n">db</span><span class="o">.</span><span class="n">update</span><span class="p">(</span><span class="n">log</span><span class="o">.</span><span class="n">key</span><span class="p">,</span> <span class="n">log</span><span class="o">.</span><span class="n">value</span><span class="p">)</span> <span class="n">group_commit_sems</span> <span class="o">=</span> <span class="p">[]</span> <span class="nd">@background_worker</span><span class="p">()</span> <span class="k">def</span> <span class="nf">group_commit</span><span class="p">():</span> <span class="k">for</span><span class="p">:</span> <span class="n">log_accumulator</span> <span class="o">=</span> <span class="n">log_page</span><span class="p">()</span> <span class="k">if</span> <span class="n">clock</span><span class="p">()</span> <span class="o">%</span> <span class="mi">5</span><span class="n">ms</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span> <span class="k">for</span> <span class="p">(</span><span class="n">log</span><span class="p">,</span> <span class="n">_</span><span class="p">)</span> <span class="ow">in</span> <span class="n">group_commit_sems</span><span class="p">:</span> <span class="n">log_accumulator</span><span class="o">.</span><span class="n">add</span><span class="p">(</span><span class="n">log</span><span class="p">)</span> <span class="n">log_f</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">log_accumulator</span><span class="o">.</span><span class="n">page</span><span class="p">())</span> <span class="c1"># Write out all log entries at once</span> <span class="n">log_f</span><span class="o">.</span><span class="n">fsync</span><span class="p">()</span> <span class="c1"># Durably flush wal data</span> <span class="k">for</span> <span class="p">(</span><span class="n">_</span><span class="p">,</span> <span class="n">sem</span><span class="p">)</span> <span class="ow">in</span> <span class="n">group_commit_sems</span><span class="p">:</span> <span class="n">sem</span><span class="o">.</span><span class="n">signal</span><span class="p">()</span> <span class="k">if</span> <span class="n">clock</span><span class="p">()</span> <span class="o">%</span> <span class="mi">1</span><span class="n">m</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span> <span class="n">db</span><span class="o">.</span><span class="n">write_to_disk</span><span class="p">(</span><span class="n">f</span><span class="p">)</span> <span class="n">f</span><span class="o">.</span><span class="n">fsync</span><span class="p">()</span> <span class="c1"># Durably flush db data</span> <span class="k">def</span> <span class="nf">handle_write</span><span class="p">(</span><span class="n">req</span><span class="p">):</span> <span class="n">db</span><span class="o">.</span><span class="n">update</span><span class="p">(</span><span class="n">req</span><span class="o">.</span><span class="n">key</span><span class="p">,</span> <span class="n">req</span><span class="o">.</span><span class="n">value</span><span class="p">)</span> <span class="n">sem</span> <span class="o">=</span> <span class="n">semaphore</span><span class="p">()</span> <span class="n">log</span> <span class="o">=</span> <span class="n">req</span> <span class="n">group_commit_sems</span><span class="o">.</span><span class="n">push</span><span class="p">((</span><span class="n">log</span><span class="p">,</span> <span class="n">sem</span><span class="p">))</span> <span class="n">sem</span><span class="o">.</span><span class="n">wait</span><span class="p">()</span> <span class="c1"># This time waiting for only the log to be written and flushed, not the btree.</span> <span class="k">return</span> <span class="mi">200</span><span class="p">,</span> <span class="p">{}</span> <span class="k">def</span> <span class="nf">handle_read</span><span class="p">(</span><span class="n">req</span><span class="p">):</span> <span class="n">value</span> <span class="o">=</span> <span class="n">db</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="n">req</span><span class="o">.</span><span class="n">key</span><span class="p">)</span> <span class="k">return</span> <span class="mi">200</span><span class="p">,</span> <span class="p">{</span><span class="s2">&quot;value&quot;</span><span class="p">:</span> <span class="n">value</span><span class="p">}</span> </pre></div> <p>This is a write-ahead log!</p> <p>Consider a few scenarios. One request writes the smallest key ever seen. And one request within the same millisecond writes the largest key ever seen. Writing these to disk on the btree means modifying at least two pages spread out in space on disk.</p> <p>But if we only have to durably write these two messages to a log, they can likely both be included in the same log page. ("Likely" so long as key and values are small enough that multiple can fit into the same page.)</p> <p>That is, it's cheaper to write only these small messages representing the client request to disk. And we save the structured btree persistence for a less frequent durable write.</p> <h3 id="filesystem-and-disk-bugs">Filesystem and disk bugs</h3><p>Sometimes filesystems will write data to the wrong place. Sometimes disks corrupt data. A solution to both of these is to checksum the data on write, store the checksum on disk, and confirm the checksum on read. This combined with a background process called scrubbing to validate unread data can help you learn quickly when your data has been corrupted and you must recover from backup.</p> <p>MongoDB's default storage engine WiredTiger <b>does</b> checksum data <a href="https://github.com/wiredtiger/wiredtiger/blob/develop/src/docs/tune-checksum.dox#L3">by default</a>.</p> <p>But some databases famous for integrity do not. Postgres does <a href="https://www.postgresql.org/docs/current/checksums.html">no data checksumming</a> by default:</p> <blockquote><p>By default, data pages are not protected by checksums, but this can optionally be enabled for a cluster. When enabled, each data page includes a checksum that is updated when the page is written and verified each time the page is read. Only data pages are protected by checksums; internal data structures and temporary files are not.</p> </blockquote> <p>SQLite likewise does no checksumming by default. Checksumming is an <a href="https://www.sqlite.org/cksumvfs.html">optional extension</a>:</p> <blockquote><p>The checksum VFS extension is a VFS shim that adds an 8-byte checksum to the end of every page in an SQLite database. The checksum is added as each page is written and verified as each page is read. The checksum is intended to help detect database corruption caused by random bit-flips in the mass storage device.</p> </blockquote> <p>But even this isn't perfect. Disks and nodes can fail completely. At that point you can only improve durability by introducing redundancy across disks (and/or nodes), for example, via distributed consensus.</p> <h3 id="other-reasons-you-<em>need</em>-a-wal?">Other reasons you <em>need</em> a WAL?</h3><p>Some databases (like SQLite) require a write-ahead log to implement aspects of ACID transactions. But this need not be a requirement for ACID transactions if you do MVCC (SQLite does not). See my previous post on <a href="https://notes.eatonphil.com/2024-05-16-mvcc.html">implementing MVCC</a> for details.</p> <p>Logical replication (also called change data capture (CDC)) is another interesting feature that requires a write-ahead log. The idea is that the log already preserves the exact order and changes that affect the database's "state machine". So we could copy these changes out of the system by tracking the write-ahead log, preserving change order, and apply these changes to a foreign system.</p> <p>But again, just CDC is not about durability. It's an ancillary feature that write-ahead logs make simple.</p> <h3 id="conclusion">Conclusion</h3><p>A few key points. One, durability primarily matters if it is established before returning a success to the client. Second, a write-ahead log is a cheap way to get durability.</p> <p>And finally, durability is a spectrum. You need to read the docs for your database to understand what it does and does not.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Here&#39;s a new post about durability and write-ahead logs. Write-ahead logs are used almost everywhere. But to build an intuition for why, it is helpful to imagine what you would do without a WAL. And to explore the meaning of durability.<a href="https://t.co/nzS2pMz22z">https://t.co/nzS2pMz22z</a> <a href="https://t.co/m1n9x8CNcp">pic.twitter.com/m1n9x8CNcp</a></p>&mdash; Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1807741130093556098?ref_src=twsrc%5Etfw">July 1, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/2024-07-01-a-write-ahead-log-is-not-a-universal-part-of-durability.htmlMon, 01 Jul 2024 00:00:00 +0000The limitations of LLMs, or why are we doing RAG?http://notes.eatonphil.com/2024-06-17-limitations-llm-or-why-are-we-doing-rag.html<head> <meta http-equiv="refresh" content="4;URL='https://www.enterprisedb.com/blog/limitations-llm-or-why-are-we-doing-rag'" /> </head><p>This is an external post of mine. Click <a href="https://www.enterprisedb.com/blog/limitations-llm-or-why-are-we-doing-rag">here</a> if you are not redirected.</p> http://notes.eatonphil.com/2024-06-17-limitations-llm-or-why-are-we-doing-rag.htmlMon, 17 Jun 2024 00:00:00 +0000Confusion is a musehttp://notes.eatonphil.com/2024-06-14-confusion-is-a-muse.html<p>Some of the most interesting technical blog posts I read come from, and a common reason for posts I write is, confusion. You're at work and you start asking questions that are difficult to answer. You spend a few hours or a day trying to get to the bottom of things.</p> <p>If you ask a question to very experienced and successful developers at work, they have a tendency not to give context and to simplify things down to a single answer. This may be a good way to make business decisions. (One can't afford to waste an eternity considering everything indefinitely.) But accepting an answer you don't understand is actively harmful for building intuition.</p> <p>Certainly, sometimes not accepting an answer can be irritating. You'll have to figure that out.</p> <p>But beyond "go along to get along", another reason we don't pursue what we're confused about is because we're embarrassed that we're confused in the first place. What's worse, the embarrassment we feel naturally grows the more experienced we get. "I've got this job title, I don't want to seem like I don't know what you mean."</p> <p>But if you fight the embarrassment and pursue your confusion regardless, you'll likely figure something very interesting out. Moreover, you will probably not have been the only person who was confused. At least personally it is quite rare that I am confused about something and no one else is.</p> <p>So pay attention when you get confused, and consider why it happened. What did you expect to be the case, and how did reality differ? Explore the angles and the options. When you finally understand, think about what led you to that understanding.</p> <p>Write it down. Put it into an internal Markdown doc, an internal Atlassian doc, an internal Google Slides page, whatever. The medium doesn't matter.</p> <p>This entire process doesn't come easily. We feel embarrassed. We aren't used to lingering on something we're confused by. We aren't used to writing things down.</p> <p>But if you can make yourself pause every once in a while and think about what you (or someone around you) got confused by, and if you can force yourself to stop getting embarrassed by what you got confused by, and if you can write down the background and the reasoning that led to your ultimate understanding, you're going to have something pretty interesting to talk about.</p> <p>You'll contribute to the growth and intuition of your colleagues. And you'll never run out of things to write about.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Confusion is embarrassing. But fight that feeling, and dig into why you&#39;re confused. And write it down.<br><br>You won&#39;t be the only one who was confused. And you&#39;ll tend to have something pretty interesting to talk about.<a href="https://t.co/IdX1nGBheR">https://t.co/IdX1nGBheR</a> <a href="https://t.co/KzTjqMxw6u">pic.twitter.com/KzTjqMxw6u</a></p>&mdash; Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1801644601536664014?ref_src=twsrc%5Etfw">June 14, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/2024-06-14-confusion-is-a-muse.htmlFri, 14 Jun 2024 00:00:00 +0000How I run a software book clubhttp://notes.eatonphil.com/2024-05-30-how-i-run-book-clubs.html<p>I've been running software book clubs almost continuously since last summer, about 12 months ago. We read through <a href="https://eatonphil.com/2023-ddia.html">Designing Data-Intensive Applications</a>, <a href="https://eatonphil.com/2023-database-internals.html">Database Internals</a>, <a href="https://eatonphil.com/2024-systems-performance.html">Systems Performance</a>, and we just started <a href="https://eatonphil.com/2024-understanding-software-dynamics.html">Understanding Software Dynamics</a>.</p> <p>The DDIA discussions were in-person in NYC with about 5-8 consistent attendees. The rest have been over email with 300, 500, and 600 attendees.</p> <p>This post is for folks who are interested in running their own book club. None of these ideas are novel. I co-opted the best parts I saw from other people running similar things. And hopefully you'll improve on my experience too, should you try.</p> <p>Despite the length of this post running a book club takes almost no noticeable effort, other than when I need to select and confirm discussion leaders. It is the limited-effort-required to thank that I've kept up the book clubs so consistently.</p> <h3 id="google-groups">Google Groups</h3><p>I run the virtual book clubs over email. I create a Google Group and tell people to send me their email for an invite. I use a Google Form to collect emails since I get many. If you're doing a small group book club you can just collect member emails directly.</p> <p>In the Google Form I ask people to volunteer to lead discussion for a chapter (or chapters). And I ask for a Twitter/GitHub/LinkedIn account.</p> <p>When I've gotten enough responses I go through the list and check Twitter/GitHub/LinkedIn info to find people who might have a particularly interesting perspective to lead a discussion.</p> <p>"Lead a discussion" sounds formal but I mean anything but. All I am looking for is someone to start a new Google Group thread each week and for them to share their thoughts.</p> <p>For example a discussion leader might share:</p> <ul> <li>What they liked about the chapter</li> <li>Something new they learned from the chapter</li> <li>A story about their work that the chapter reminded them of</li> <li>A little project they hacked on, inspired by reading the chapter</li> <li>A paper or YouTube video this chapter reminded them of</li> <li>Something from the chapter that was confusing</li> <li>Etc.</li> </ul> <p>The "discussion leader" has no responsibility for remaining in the discussion after posting the thread. There just isn't an easy way to say "person who kicks off discussion" than to call them a "discussion leader".</p> <p>By the way, I didn't do discussion leaders for the first book club, reading DDIA. And that book club took noticeably more effort. Because I organized it, I was effectively the discussion leader every time. Having discussion leaders disperses the effort of the book club. And I think it makes the club much more interesting.</p> <h4 id="sparknotes-ification">SparkNotes-ification</h4><p>One thing I noticed happening often was that the discussion leader might do a large summary of the chapter. I greatly appreciate and respect that effort, but I think this is not the ideal thing to happen. Of course you can't control what people do and maybe they really wanted to write a summary. But since noticing this happen I now try to discourage the discussion leader from summarizing since 1) it must be quite time-consuming and 2) it isn't as interesting as some of the above bullet points.</p> <h4 id="confirming-with-leaders">Confirming with leaders</h4><p>When I have picked out folks who seem like they'd be fun discussion leaders, I bcc email them all asking them to confirm. At the same time I explain what being a discussion leader means. As I just explained it here above.</p> <p>Each week's discussion gets a new Google Group thread. Discussion happens in responses to the thread.</p> <p>I ask the discussion leaders to create the new discussion thread between Friday and Saturday their local time.</p> <p>For folks who don't confirm, I email them one last time and then if they still haven't confirmed I find someone new.</p> <p>I always lead the first week's discussion so that the discussion leaders can see what I do and so that I can establish the pattern.</p> <h4 id="managing-leaders">Managing leaders</h4><p>It takes a while to read a book. Sometimes the leaders forget to do their part. If it gets to be Sunday and the discussion leader for the week hasn't started discussion, I email them to gently ask if they are still available to kick off discussion. And if they are not, no worries, I can step in.</p> <p>I have had to step in a few times to start discussion and it's no problem.</p> <h4 id="managing-non-leaders">Managing non-leaders</h4><p>Just as you need to clarify and set expectations for discussion leaders, you need to clarify and set expectations for everyone else.</p> <p>When I invite people to the Google Group I typically also create an Intro thread where I explain the discussion format.</p> <p>An annoying aspect of Google Groups is that I cannot limit who can <em>create</em> a thread without limiting who can <em>respond</em> to a thread.</p> <p>It would simplify things for me if I could limit thread creation to discussion leaders. But since I cannot, I try to repeatedly and explicitly mention in the Intro thread that no one should start a new discussion thread unless they are a discussion leader. And that new threads will come out each weekend to discuss the previous chapter.</p> <h4 id="setting-the-tone">Setting the tone</h4><p>One of the most important things to do in the Intro email is to set the tone. I try to clarify this is a friendly and encouraging group focused on learning and improving ourselves. We have experts in the group and we have noobs in the group and they are all welcome and will all come away with different things.</p> <h3 id="why-email?">Why email?</h3><p>Email seems to be the most time-friendly and demographic-friendly medium. Doing live discussion sounds stressful and difficult to schedule, although I believe Alex Petrov <a href="https://x.com/ifesdjeen/status/1795813863197409384">runs live discussions</a>. Email forces you to slow down and think things through. And email is built around an inbox. If you didn't get to read some discussion, you can mark it unread. You can't do that in Discord or Slack.</p> <h3 id="avoiding-long-term-commitments">Avoiding long-term commitments</h3><p>When I pick a book, aside from picking books I think are likely to be exceptionally well-written, I try to avoid books that we could not finish within 3 months. It concerns me to try to get people to commit to something longer than that.</p> <p>This has led to some distortion though. Systems Performance has only 16 chapters. One chapter a week means about 3 months in total. But each chapter is 100 pages long.</p> <p>I was hesitant to do a reading of Understanding Software Dynamics because it has 28 chapters. But each chapter is only 10-15 pages long. So when I decided to go with it, I decided we'd read 2 chapters a week. Each discussion leader is responsible for 2 chapters at a time. That means we can finish within 3 months. And each week we read only 20-30 pages, which is still much more doable than 100 pages of Systems Performance.</p> <p>On the other hand, we did make it through Systems Performance! Which gives me confidence to pick other books that are physically daunting, should they otherwise seem like a good idea.</p> <h4 id="a-book-ends">A book ends</h4><p>Many public book clubs go through a book a month and have no ending. That is totally fair. But what I love about the way I organize book clubs is that each reading is unrelated to the next. It's an entirely new signup for each book. You need only "commit" (I mean, you can drop off whenever and definitely people do) to a 3-month reading and then you can justly feel good about yourself and join again in the future or not.</p> <p>In contrast a paper reading club has no obvious ending, unless you pick all the papers in advance and organize them around a school year or something. This has made running a paper reading club feel more concerning to me. Though I greatly appreciate folks like Aleksey Charapko and Murat Demirbas <a href="https://charap.co/category/reading-group/">who do</a>.</p> <h3 id="most-people-don't-actively-contribute,-but-they-still-value-it">Most people don't actively contribute, but they still value it</h3><p>In a group of 500 people, maybe 1-2% of those people actively contribute to discussion. 5-10 people. But I often hear from people who didn't participate that they still highly valued the group. And this high percentage of non-active-participants is part of why I keep allowing the group size to grow. There's little work I have to do and a bunch of people benefit.</p> <h3 id="doing-it-at-your-company-likely-won't-go-well">Doing it at your company likely won't go well</h3><p>I wrote about this <a href="https://notes.eatonphil.com/eight-years-of-tech-meetups.html">before</a>. For some reason it's hard to get people who would otherwise join an external reading club to join a company-internal reading club.</p> <p>Though perhaps I'm just doing it wrong because I hear of others like <a href="https://twitter.com/sqlliz/status/1745463496161325087">Elizabeth Garrett Christensen</a> who run an internal software book club successfully.</p> <h3 id="good-luck,-have-fun!">Good luck, have fun!</h3><p>That's all I've got. Send me questions if you've got any. But mostly, just give it a shot if you want to and you'll learn!</p> <p>And if you still don't get it, you can of course just join one of my book clubs. :)</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Since folks have asked, here&#39;s how I run a software book club.<br><br>But also, you could just join and see. :)<a href="https://t.co/tXBrLFYbvC">https://t.co/tXBrLFYbvC</a> <a href="https://t.co/4iW8EfZCeY">pic.twitter.com/4iW8EfZCeY</a></p>&mdash; Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1796159854496600164?ref_src=twsrc%5Etfw">May 30, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/2024-05-30-how-i-run-book-clubs.htmlThu, 30 May 2024 00:00:00 +0000Implementing MVCC and major SQL transaction isolation levelshttp://notes.eatonphil.com/2024-05-16-mvcc.html<p>In this post we'll build a database in 400 lines of code with basic support for five standard SQL transaction levels: Read Uncommitted, Read Committed, Repeatable Read, Snapshot Isolation and Serializable. We'll use multi-version concurrency control (MVCC) and optimistic concurrency control (OCC) to accomplish this. The goal isn't to be perfect but to explain the basics in a minimal way.</p> <p>You don't need to know what these terms mean in advance. I did not understand them before doing this project. But if you've ever dealt with SQL databases, transaction isolation levels are likely one of the dark corners you either 1) weren't aware of or 2) wanted not to think about. At least, this is how I felt.</p> <p>While there are many blog posts that list out isolation levels, I haven't been able to internalize their lessons. So I built this little database to demonstrate the common isolation levels for myself. It turned out to be simpler than I expected, and made the isolation levels much easier to reason about.</p> <p>Thank you to Justin Jaffray, Alex Miller, Sujay Jayakar, Peter Veentjer, and Michael Gasch for providing feedback and suggestions.</p> <p>All code is <a href="https://github.com/eatonphil/gomvcc">available</a> on GitHub.</p> <h3 id="why-do-we-need-transaction-isolation?">Why do we need transaction isolation?</h3><p>If you already know the answer, feel free to skip this section.</p> <p>When I first started working with databases in CRUD applications, I did not understand the point of transactions. I was fairly certain that transactions are locks. I was wrong about that, but more on that later.</p> <p>I can't remember exact code I wrote, but here's something I could have written:</p> <div class="highlight"><pre><span></span><span class="k">with</span> <span class="n">database</span><span class="o">.</span><span class="n">transaction</span><span class="p">()</span> <span class="k">as</span> <span class="n">t</span><span class="p">:</span> <span class="n">users</span> <span class="o">=</span> <span class="n">t</span><span class="o">.</span><span class="n">query</span><span class="p">(</span><span class="s2">&quot;SELECT * FROM users WHERE group = &#39;admin&#39;;&quot;</span><span class="p">)</span> <span class="n">ids</span> <span class="o">=</span> <span class="p">[]</span> <span class="k">for</span> <span class="n">user</span> <span class="ow">in</span> <span class="n">users</span><span class="p">:</span> <span class="k">if</span> <span class="n">some_complex_logic</span><span class="p">(</span><span class="n">user</span><span class="p">):</span> <span class="n">ids</span><span class="o">.</span><span class="n">push</span><span class="p">(</span><span class="n">user</span><span class="o">.</span><span class="n">id</span><span class="p">)</span> <span class="n">t</span><span class="o">.</span><span class="n">query</span><span class="p">(</span><span class="s2">&quot;UPDATE users SET metadata = &#39;some value&#39; WHERE id IN ($1)&#39;;&quot;</span><span class="p">,</span> <span class="n">ids</span><span class="p">)</span> </pre></div> <p>I would have thought that all users that were seen from the initial <code>SELECT</code> who matched the <code>some_complex_logic</code> filter would be exactly the same that are updated in my second SQL statement.</p> <p>And if I were using SQLite, my guess would have been correct. But if I were using MySQL or Postgres or Oracle or SQL Server, and hadn't made any changes to defaults, that wouldn't necessarily be true! We'll discover exactly why throughout this post.</p> <p>For example, some other connection and transaction could have set a <code>user</code>'s <code>group</code> to <code>admin</code> after the initial <code>SELECT</code> was executed. It would then be missed from the <code>some_complex_logic</code> check and from the subsequent <code>UPDATE</code>.</p> <p>Or, again after our initial <code>SELECT</code>, some other connection could have modified the <code>group</code> for some user that previously was <code>admin</code>. It would then be incorrectly part of the second <code>UPDATE</code> statement.</p> <p>These are just a few examples of what could go wrong.</p> <p>This is the realm of transaction isolation. How do multiple transactions running at the same time, interacting with the same data, interact with each other?</p> <p>The answer is: it depends. The SQL standard itself loosely prescribes four isolation levels. But every database implements these four levels slightly differently. Sometimes using entirely different algorithms. And even among the standard levels, the default isolation level for each database differs.</p> <p>Funky bugs that can show up across databases and across isolation levels, often dependent on particular details of common ways of implementing isolation levels, create what are called "anomalies". Examples include intimidating terms like "dirty reads" and "write cycles" and G2-Item.</p> <p>The topic is so complex that we've got decades of research papers <a href="https://15721.courses.cs.cmu.edu/spring2019/papers/02-transactions/p1-berenson.pdf">critiquing</a> SQL isolation levels, <a href="https://pmg.csail.mit.edu/papers/icde00.pdf">categorization</a> of common isolation anomalies, walkthroughs of anomalies by Martin Kleppmann in <a href="https://dataintensive.net/">Designing Data-Intensive Applications</a>, Martin Kleppman's <a href="https://github.com/ept/hermitage">Hermitage</a> project documenting common anomalies across isolation levels in major databases, and the <a href="http://www.bailis.org/papers/acidrain-sigmod2017.pdf">ACIDRain paper</a> showing isolation-related bugs in major open-source ecommerce projects.</p> <p>These aren't just random links. They're each quite interesting. And particularly for practitioners who don't know why they should care, check out Designing Data-Intensive Applications and the last link on ACIDRain.</p> <p>And this is only a small list of some of the most interesting research and writing on the topic.</p> <p>So there's a wide variety of things to consider:</p> <ul> <li>Not every database implements transaction isolation levels identically, resulting in different behavior</li> <li>Not all researchers agree, and not all database developers agree, on what any given isolation level means</li> <li>Not every database has the same default isolation level, and most developers tend not to change the default</li> <li>Not every developer is correctly using the isolation level they pick (default or not)</li> </ul> <p>Transaction isolation levels are basically vibes. The only truth for real projects is Martin Kleppmann's <a href="">Hermitage</a> project that catalogs behavior across databases. And a truth some people align with is <a href="https://pmg.csail.mit.edu/papers/icde00.pdf">Generalized Isolation Level Definitions</a>.</p> <p>So while all these linked works above are authoritative, and even though we can see that there might be some anomalies we have to worry about, the research can still be difficult to internalize. And many developers, my recent self included, do not have a great understanding of isolation levels.</p> <p>Throughout this post we'll stick to informal definitions of isolation levels to keep things simple.</p> <p>Let's dig in.</p> <h3 id="locks?-mvcc?">Locks? MVCC?</h3><p>Historically, databases implemented isolation with locking algorithms such as <a href="https://faculty.cc.gatech.edu/~jarulraj/courses/8803-s22/slides/13-two-phase-locking-annotated.pdf">Two-Phase Locking</a> (not the same thing as <a href="https://www.cs.princeton.edu/courses/archive/fall16/cos418/docs/L6-2pc.pdf">Two-Phase Commit</a>). Multi-version concurrency control (MVCC) is an approach that lets us completely avoid locks.</p> <p>It's worthwhile to note that while we will validly not use locks (implementing what is called optimistic concurrency control or OCC), most MVCC databases do still use locks for certain things (implementing what is called pessimistic concurrency control).</p> <p>But this is the story of databases in general. There are numerous ways to implement things.</p> <p>We will take the simpler lockless route.</p> <p>Consider a key-value database. With MVCC, rather than storing only the value for a key, we would store versions of the value. The version includes the transaction id (a monotonic incrementing integer) wherein the version was created, and the transaction id wherein the version was deleted.</p> <p>With MVCC, it is possible to express transaction isolation levels almost solely as a set of different visibility rules for a version of a value; rules that vary by isolation level.</p> <p>So we will build up a general framework first and discuss and implement each isolation level last.</p> <h3 id="scaffolding">Scaffolding</h3><p>We'll build an in-memory key-value system that acts on transactions. I usually try to stick with only the standard library for projects like this but I really wanted a sorted data structure and Go doesn't implement one.</p> <p>In <code>main.go</code>, let's set up basic helpers for assertions and debugging:</p> <div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span> <span class="kn">import</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="s">&quot;fmt&quot;</span> <span class="w"> </span><span class="s">&quot;os&quot;</span> <span class="w"> </span><span class="s">&quot;slices&quot;</span> <span class="w"> </span><span class="s">&quot;github.com/tidwall/btree&quot;</span> <span class="p">)</span> <span class="kd">func</span><span class="w"> </span><span class="nx">assert</span><span class="p">(</span><span class="nx">b</span><span class="w"> </span><span class="kt">bool</span><span class="p">,</span><span class="w"> </span><span class="nx">msg</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">b</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">msg</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">assertEq</span><span class="p">[</span><span class="nx">C</span><span class="w"> </span><span class="kt">comparable</span><span class="p">](</span><span class="nx">a</span><span class="w"> </span><span class="nx">C</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="nx">C</span><span class="p">,</span><span class="w"> </span><span class="nx">prefix</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">a</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">&quot;%s &#39;%v&#39; != &#39;%v&#39;&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">prefix</span><span class="p">,</span><span class="w"> </span><span class="nx">a</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="p">))</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> <span class="kd">var</span><span class="w"> </span><span class="nx">DEBUG</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">slices</span><span class="p">.</span><span class="nx">Contains</span><span class="p">(</span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;--debug&quot;</span><span class="p">)</span> <span class="kd">func</span><span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="nx">a</span><span class="w"> </span><span class="o">...</span><span class="kt">any</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">DEBUG</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">args</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">append</span><span class="p">([]</span><span class="kt">any</span><span class="p">{</span><span class="s">&quot;[DEBUG]&quot;</span><span class="p">},</span><span class="w"> </span><span class="nx">a</span><span class="o">...</span><span class="p">)</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="nx">args</span><span class="o">...</span><span class="p">)</span> <span class="p">}</span> </pre></div> <p>As mentioned previously, a value in the database will be defined with start and end transaction ids.</p> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">Value</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">txStartId</span><span class="w"> </span><span class="kt">uint64</span> <span class="w"> </span><span class="nx">txEndId</span><span class="w"> </span><span class="kt">uint64</span> <span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="kt">string</span> <span class="p">}</span> </pre></div> <p>Every transaction will be in an in-progress, aborted, or committed state.</p> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">TransactionState</span><span class="w"> </span><span class="kt">uint8</span> <span class="kd">const</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="nx">InProgressTransaction</span><span class="w"> </span><span class="nx">TransactionState</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">iota</span> <span class="w"> </span><span class="nx">AbortedTransaction</span> <span class="w"> </span><span class="nx">CommittedTransaction</span> <span class="p">)</span> </pre></div> <p>And we'll support a few major isolation levels.</p> <div class="highlight"><pre><span></span><span class="c1">// Loosest isolation at the top, strictest isolation at the bottom.</span> <span class="kd">type</span><span class="w"> </span><span class="nx">IsolationLevel</span><span class="w"> </span><span class="kt">uint8</span> <span class="kd">const</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="nx">ReadUncommittedIsolation</span><span class="w"> </span><span class="nx">IsolationLevel</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">iota</span> <span class="w"> </span><span class="nx">ReadCommittedIsolation</span> <span class="w"> </span><span class="nx">RepeatableReadIsolation</span> <span class="w"> </span><span class="nx">SnapshotIsolation</span> <span class="w"> </span><span class="nx">SerializableIsolation</span> <span class="p">)</span> </pre></div> <p>We'll get into detail about the meaning of the levels later.</p> <p>A transaction has an isolation level, an id (monotonic increasing integer), and a current state. And although we won't make use of this data yet, transactions at stricter isolation levels will need some extra info. Specifically, stricter isolation levels need to know about other transactions that were in-progress when this one started. And stricter isolation levels need to know about all keys read and written by a transaction.</p> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">Transaction</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">isolation</span><span class="w"> </span><span class="nx">IsolationLevel</span> <span class="w"> </span><span class="nx">id</span><span class="w"> </span><span class="kt">uint64</span> <span class="w"> </span><span class="nx">state</span><span class="w"> </span><span class="nx">TransactionState</span> <span class="w"> </span><span class="c1">// Used only by Repeatable Read and stricter.</span> <span class="w"> </span><span class="nx">inprogress</span><span class="w"> </span><span class="nx">btree</span><span class="p">.</span><span class="nx">Set</span><span class="p">[</span><span class="kt">uint64</span><span class="p">]</span> <span class="w"> </span><span class="c1">// Used only by Snapshot Isolation and stricter.</span> <span class="w"> </span><span class="nx">writeset</span><span class="w"> </span><span class="nx">btree</span><span class="p">.</span><span class="nx">Set</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span> <span class="w"> </span><span class="nx">readset</span><span class="w"> </span><span class="nx">btree</span><span class="p">.</span><span class="nx">Set</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span> <span class="p">}</span> </pre></div> <p>We'll discuss why later.</p> <p>Finally, the database itself will have a default isolation level that each transaction will inherit (for our own convenience in tests).</p> <p>The database will have a mapping of keys to an array of value versions. Later elements in the array will represent newer versions of a value.</p> <p>The database will also store the next free transaction id it will use to assign ids to new transactions.</p> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">Database</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">defaultIsolation</span><span class="w"> </span><span class="nx">IsolationLevel</span> <span class="w"> </span><span class="nx">store</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">][]</span><span class="nx">Value</span> <span class="w"> </span><span class="nx">transactions</span><span class="w"> </span><span class="nx">btree</span><span class="p">.</span><span class="nx">Map</span><span class="p">[</span><span class="kt">uint64</span><span class="p">,</span><span class="w"> </span><span class="nx">Transaction</span><span class="p">]</span> <span class="w"> </span><span class="nx">nextTransactionId</span><span class="w"> </span><span class="kt">uint64</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">newDatabase</span><span class="p">()</span><span class="w"> </span><span class="nx">Database</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">Database</span><span class="p">{</span> <span class="w"> </span><span class="nx">defaultIsolation</span><span class="p">:</span><span class="w"> </span><span class="nx">ReadCommittedIsolation</span><span class="p">,</span> <span class="w"> </span><span class="nx">store</span><span class="p">:</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">][]</span><span class="nx">Value</span><span class="p">{},</span> <span class="w"> </span><span class="c1">// The `0` transaction id will be used to mean that</span> <span class="w"> </span><span class="c1">// the id was not set. So all valid transaction ids</span> <span class="w"> </span><span class="c1">// must start at 1.</span> <span class="w"> </span><span class="nx">nextTransactionId</span><span class="p">:</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p class="note"> To be thread-safe: <code>store</code>, <code>transactions</code>, and <code>nextTransactionId</code> should be guarded by a mutex. But to keep the code small, this post will not use goroutines and thus does not need mutexes. </p><p>There's a bit of book-keeping when creating a transaction, so we'll make a dedicated method for this. We must give the new transaction an id, store all in-progress transactions, and add it to database transaction history.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">d</span><span class="w"> </span><span class="o">*</span><span class="nx">Database</span><span class="p">)</span><span class="w"> </span><span class="nx">inprogress</span><span class="p">()</span><span class="w"> </span><span class="nx">btree</span><span class="p">.</span><span class="nx">Set</span><span class="p">[</span><span class="kt">uint64</span><span class="p">]</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">ids</span><span class="w"> </span><span class="nx">btree</span><span class="p">.</span><span class="nx">Set</span><span class="p">[</span><span class="kt">uint64</span><span class="p">]</span> <span class="w"> </span><span class="nx">iter</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">transactions</span><span class="p">.</span><span class="nx">Iter</span><span class="p">()</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">iter</span><span class="p">.</span><span class="nx">First</span><span class="p">();</span><span class="w"> </span><span class="nx">ok</span><span class="p">;</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">iter</span><span class="p">.</span><span class="nx">Next</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">iter</span><span class="p">.</span><span class="nx">Value</span><span class="p">().</span><span class="nx">state</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">InProgressTransaction</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">ids</span><span class="p">.</span><span class="nx">Insert</span><span class="p">(</span><span class="nx">iter</span><span class="p">.</span><span class="nx">Key</span><span class="p">())</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">ids</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">d</span><span class="w"> </span><span class="o">*</span><span class="nx">Database</span><span class="p">)</span><span class="w"> </span><span class="nx">newTransaction</span><span class="p">()</span><span class="w"> </span><span class="o">*</span><span class="nx">Transaction</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">t</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">Transaction</span><span class="p">{}</span> <span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">isolation</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">defaultIsolation</span> <span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">state</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">InProgressTransaction</span> <span class="w"> </span><span class="c1">// Assign and increment transaction id.</span> <span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">id</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">nextTransactionId</span> <span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">nextTransactionId</span><span class="o">++</span> <span class="w"> </span><span class="c1">// Store all inprogress transaction ids.</span> <span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">inprogress</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">inprogress</span><span class="p">()</span> <span class="w"> </span><span class="c1">// Add this transaction to history.</span> <span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">transactions</span><span class="p">.</span><span class="nx">Set</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">id</span><span class="p">,</span><span class="w"> </span><span class="nx">t</span><span class="p">)</span> <span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">&quot;starting transaction&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">id</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">t</span> <span class="p">}</span> </pre></div> <p>And we'll add a few more helpers for completing a transaction, for fetching a transaction by id, and for validating a transaction.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">d</span><span class="w"> </span><span class="o">*</span><span class="nx">Database</span><span class="p">)</span><span class="w"> </span><span class="nx">completeTransaction</span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="o">*</span><span class="nx">Transaction</span><span class="p">,</span><span class="w"> </span><span class="nx">state</span><span class="w"> </span><span class="nx">TransactionState</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="s">&quot;completing transaction &quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">id</span><span class="p">)</span> <span class="w"> </span><span class="c1">// Update transactions.</span> <span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">state</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">state</span> <span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">transactions</span><span class="p">.</span><span class="nx">Set</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">id</span><span class="p">,</span><span class="w"> </span><span class="o">*</span><span class="nx">t</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">d</span><span class="w"> </span><span class="o">*</span><span class="nx">Database</span><span class="p">)</span><span class="w"> </span><span class="nx">transactionState</span><span class="p">(</span><span class="nx">txId</span><span class="w"> </span><span class="kt">uint64</span><span class="p">)</span><span class="w"> </span><span class="nx">Transaction</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">t</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">d</span><span class="p">.</span><span class="nx">transactions</span><span class="p">.</span><span class="nx">Get</span><span class="p">(</span><span class="nx">txId</span><span class="p">)</span> <span class="w"> </span><span class="nx">assert</span><span class="p">(</span><span class="nx">ok</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;valid transaction&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">t</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">d</span><span class="w"> </span><span class="o">*</span><span class="nx">Database</span><span class="p">)</span><span class="w"> </span><span class="nx">assertValidTransaction</span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="o">*</span><span class="nx">Transaction</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">assert</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">id</span><span class="w"> </span><span class="p">&gt;</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;valid id&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">assert</span><span class="p">(</span><span class="nx">d</span><span class="p">.</span><span class="nx">transactionState</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">id</span><span class="p">).</span><span class="nx">state</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">InProgressTransaction</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;in progress&quot;</span><span class="p">)</span> <span class="p">}</span> </pre></div> <p>The final bit of scaffolding we'll set up is an abstraction for database connections. A connection will have at most associated one transaction. Users must ask the database for a new connection. Then within the connection they can manage a transaction.</p> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">Connection</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">tx</span><span class="w"> </span><span class="o">*</span><span class="nx">Transaction</span> <span class="w"> </span><span class="nx">db</span><span class="w"> </span><span class="o">*</span><span class="nx">Database</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">Connection</span><span class="p">)</span><span class="w"> </span><span class="nx">execCommand</span><span class="p">(</span><span class="nx">command</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="nx">command</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="p">)</span> <span class="w"> </span><span class="c1">// TODO</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;unimplemented&quot;</span><span class="p">)</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">Connection</span><span class="p">)</span><span class="w"> </span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="nx">cmd</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">execCommand</span><span class="p">(</span><span class="nx">cmd</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="p">)</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;unexpected error&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">res</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">d</span><span class="w"> </span><span class="o">*</span><span class="nx">Database</span><span class="p">)</span><span class="w"> </span><span class="nx">newConnection</span><span class="p">()</span><span class="w"> </span><span class="o">*</span><span class="nx">Connection</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">Connection</span><span class="p">{</span> <span class="w"> </span><span class="nx">db</span><span class="p">:</span><span class="w"> </span><span class="nx">d</span><span class="p">,</span> <span class="w"> </span><span class="nx">tx</span><span class="p">:</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="s">&quot;unimplemented&quot;</span><span class="p">)</span> <span class="p">}</span> </pre></div> <p>And that's it for scaffolding. Now set up the go module and make sure this builds.</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>go<span class="w"> </span>mod<span class="w"> </span>init<span class="w"> </span>gomvcc <span class="go">go: creating new go.mod: module gomvcc</span> <span class="go">go: to add module requirements and sums:</span> <span class="go"> go mod tidy</span> <span class="gp">$ </span>go<span class="w"> </span>mod<span class="w"> </span>tidy <span class="go">go: finding module for package github.com/tidwall/btree</span> <span class="go">go: found github.com/tidwall/btree in github.com/tidwall/btree v1.7.0</span> <span class="gp">$ </span>go<span class="w"> </span>build <span class="gp">$ </span>./gomvcc <span class="go">panic: unimplemented</span> <span class="go">goroutine 1 [running]:</span> <span class="go">main.main()</span> <span class="go"> /Users/phil/tmp/main.go:166 +0x2c</span> </pre></div> <p>Great!</p> <h3 id="transaction-management">Transaction management</h3><p>When the user asks to begin a transaction, we ask the database for a new transaction and assign it to the current connection.</p> <div class="highlight"><pre><span></span><span class="w"> </span>func (c *Connection) execCommand(command string, args []string) (string, error) { <span class="w"> </span> debug(command, args) <span class="gi">+ if command == &quot;begin&quot; {</span> <span class="gi">+ assertEq(c.tx, nil, &quot;no running transactions&quot;)</span> <span class="gi">+ c.tx = c.db.newTransaction()</span> <span class="gi">+ c.db.assertValidTransaction(c.tx)</span> <span class="gi">+ return fmt.Sprintf(&quot;%d&quot;, c.tx.id), nil</span> <span class="gi">+ }</span> <span class="gi">+</span> <span class="w"> </span> // TODO <span class="w"> </span> return &quot;&quot;, fmt.Errorf(&quot;unimplemented&quot;) <span class="w"> </span>} </pre></div> <p>To abort a transaction, we call the <code>completeTransaction</code> method (which makes sure the database transaction history gets updated) with the <code>AbortedTransaction</code> state.</p> <div class="highlight"><pre><span></span><span class="w"> </span> return fmt.Sprintf(&quot;%d&quot;, c.tx.id), nil <span class="w"> </span> } <span class="gi">+ if command == &quot;abort&quot; {</span> <span class="gi">+ c.db.assertValidTransaction(c.tx)</span> <span class="gi">+ err := c.db.completeTransaction(c.tx, AbortedTransaction)</span> <span class="gi">+ c.tx = nil</span> <span class="gi">+ return &quot;&quot;, err</span> <span class="gi">+ }</span> <span class="gi">+</span> <span class="w"> </span> // TODO <span class="w"> </span> return &quot;&quot;, fmt.Errorf(&quot;unimplemented&quot;) <span class="w"> </span>} </pre></div> <p>And to commit a transaction is similar.</p> <div class="highlight"><pre><span></span><span class="w"> </span> return &quot;&quot;, err <span class="w"> </span> } <span class="gi">+ if command == &quot;commit&quot; {</span> <span class="gi">+ c.db.assertValidTransaction(c.tx)</span> <span class="gi">+ err := c.db.completeTransaction(c.tx, CommittedTransaction)</span> <span class="gi">+ c.tx = nil</span> <span class="gi">+ return &quot;&quot;, err</span> <span class="gi">+ }</span> <span class="gi">+</span> <span class="w"> </span> // TODO <span class="w"> </span> return &quot;&quot;, fmt.Errorf(&quot;unimplemented&quot;) <span class="w"> </span>} </pre></div> <p>The neat thing about MVCC is that beginning, committing, and aborting a transaction is metadata work. Committing a transaction will get a bit more complex when we add support for Snapshot Isolation and Serializable Isolation, but we'll get to that later. Even then, it will not involve modifying any values we get, set, or delete.</p> <h3 id="get,-set,-delete">Get, set, delete</h3><p>Here is where things get fun. As mentioned earlier, the key-value store is actually <code>map[string][]Value</code>. With the more recent versions of a value at the end of the list of values for the key.</p> <p>For <code>get</code> support, we'll iterate the list of value versions backwards for the key. And we'll call a special new <code>isvisible</code> method to determine if this transaction can see this value. The first value that passes the <code>isvisible</code> test is the correct value for the transaction.</p> <div class="highlight"><pre><span></span><span class="w"> </span> return &quot;&quot;, err <span class="w"> </span> } <span class="gi">+ if command == &quot;get&quot; {</span> <span class="gi">+ c.db.assertValidTransaction(c.tx)</span> <span class="gi">+</span> <span class="gi">+ key := args[0]</span> <span class="gi">+</span> <span class="gi">+ c.tx.readset.Insert(key)</span> <span class="gi">+</span> <span class="gi">+ for i := len(c.db.store[key]) - 1; i &gt;= 0; i-- {</span> <span class="gi">+ value := c.db.store[key][i]</span> <span class="gi">+ debug(value, c.tx, c.db.isvisible(c.tx, value))</span> <span class="gi">+ if c.db.isvisible(c.tx, value) {</span> <span class="gi">+ return value.value, nil</span> <span class="gi">+ }</span> <span class="gi">+ }</span> <span class="gi">+</span> <span class="gi">+ return &quot;&quot;, fmt.Errorf(&quot;cannot get key that does not exist&quot;)</span> <span class="gi">+ }</span> <span class="gi">+</span> <span class="w"> </span> // TODO <span class="w"> </span> return &quot;&quot;, fmt.Errorf(&quot;unimplemented&quot;) <span class="w"> </span>} </pre></div> <p>I snuck in tracking which keys are read, and we'll also soon sneak in tracking which keys are written. This is necessary in stricter isolation levels. More on that later.</p> <p><code>set</code> and <code>delete</code> are similar to get. But this time when we walk the list of value versions, we will set the <code>txEndId</code> for the value to the current transaction id if the value version is visible to this transaction.</p> <p>Then, for <code>set</code>, we'll append to the value version list with the new version of the value that starts at this current transaction.</p> <div class="highlight"><pre><span></span><span class="w"> </span> return &quot;&quot;, err <span class="w"> </span> } <span class="gi">+ if command == &quot;set&quot; || command == &quot;delete&quot; {</span> <span class="gi">+ c.db.assertValidTransaction(c.tx)</span> <span class="gi">+</span> <span class="gi">+ key := args[0]</span> <span class="gi">+</span> <span class="gi">+ // Mark all visible versions as now invalid.</span> <span class="gi">+ found := false</span> <span class="gi">+ for i := len(c.db.store[key]) - 1; i &gt;= 0; i-- {</span> <span class="gi">+ value := &amp;c.db.store[key][i]</span> <span class="gi">+ debug(value, c.tx, c.db.isvisible(c.tx, *value))</span> <span class="gi">+ if c.db.isvisible(c.tx, *value) {</span> <span class="gi">+ value.txEndId = c.tx.id</span> <span class="gi">+ found = true</span> <span class="gi">+ }</span> <span class="gi">+ }</span> <span class="gi">+ if command == &quot;delete&quot; &amp;&amp; !found {</span> <span class="gi">+ return &quot;&quot;, fmt.Errorf(&quot;cannot delete key that does not exist&quot;)</span> <span class="gi">+ }</span> <span class="gi">+</span> <span class="gi">+ c.tx.writeset.Insert(key)</span> <span class="gi">+</span> <span class="gi">+ // And add a new version if it&#39;s a set command.</span> <span class="gi">+ if command == &quot;set&quot; {</span> <span class="gi">+ value := args[1]</span> <span class="gi">+ c.db.store[key] = append(c.db.store[key], Value{</span> <span class="gi">+ txStartId: c.tx.id,</span> <span class="gi">+ txEndId: 0,</span> <span class="gi">+ value: value,</span> <span class="gi">+ })</span> <span class="gi">+</span> <span class="gi">+ return value, nil</span> <span class="gi">+ }</span> <span class="gi">+</span> <span class="gi">+ // Delete ok.</span> <span class="gi">+ return &quot;&quot;, nil</span> <span class="gi">+ }</span> <span class="gi">+</span> <span class="w"> </span> if command == &quot;get&quot; { <span class="w"> </span> c.db.assertValidTransaction(c.tx) </pre></div> <p>This time rather than modifying the <code>readset</code> we modify the <code>writeset</code> for the transaction.</p> <p>And that is how commands get executed!</p> <p>Let's zoom in to the core of the problem we have mentioned but not implemented: MVCC visibility rules and how they differ by isolation levels.</p> <h3 id="isolation-levels-and-mvcc-visibility-rules">Isolation levels and MVCC visibility rules</h3><p>To varying degrees, transaction isolation levels prevent concurrent transactions from messing with each other. The looser isolation levels prevent this almost not at all.</p> <p>Here is what the <a href="https://web.cecs.pdx.edu/~len/sql1999.pdf">1999 ANSI SQL standard</a> (page 84) has to say.</p> <p><img src="/sql99isolation.png" alt="/sql99isolation.png"></p> <p>But as I mentioned in the beginning of the post, we're going to be a bit informal. And we'll mostly refer to <a href="https://jepsen.io/consistency">Jepsen</a> summaries of each isolation levels.</p> <h4 id="read-uncommitted">Read Uncommitted</h4><p>According to <a href="https://jepsen.io/consistency/models/read-uncommitted">Jepsen</a>, the loosest isolation level, Read Uncommitted, has almost no restrictions. We can merely read the most recent (non-deleted) version of a value, regardless of if the transaction that set it has committed or aborted or not.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">d</span><span class="w"> </span><span class="o">*</span><span class="nx">Database</span><span class="p">)</span><span class="w"> </span><span class="nx">isvisible</span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="o">*</span><span class="nx">Transaction</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="nx">Value</span><span class="p">)</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Read Uncommitted means we simply read the last value</span> <span class="w"> </span><span class="c1">// written. Even if the transaction that wrote this value has</span> <span class="w"> </span><span class="c1">// not committed, and even if it has aborted.</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">isolation</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">ReadUncommittedIsolation</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// We must merely make sure the value has not been</span> <span class="w"> </span><span class="c1">// deleted.</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">value</span><span class="p">.</span><span class="nx">txEndId</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">assert</span><span class="p">(</span><span class="kc">false</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;unsupported isolation level&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">false</span> <span class="p">}</span> </pre></div> <p>Let's write a test that demonstrates this. We create two transactions, <code>c1</code> and <code>c2</code>, and set a key in <code>c1</code>. The value set for the key in <code>c1</code> should be immediately visible if <code>c2</code> asks for that key. In main_test.go:</p> <div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span> <span class="kn">import</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="s">&quot;testing&quot;</span> <span class="p">)</span> <span class="kd">func</span><span class="w"> </span><span class="nx">TestReadUncommitted</span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="o">*</span><span class="nx">testing</span><span class="p">.</span><span class="nx">T</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">database</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">newDatabase</span><span class="p">()</span> <span class="w"> </span><span class="nx">database</span><span class="p">.</span><span class="nx">defaultIsolation</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">ReadUncommittedIsolation</span> <span class="w"> </span><span class="nx">c1</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">database</span><span class="p">.</span><span class="nx">newConnection</span><span class="p">()</span> <span class="w"> </span><span class="nx">c1</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;begin&quot;</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span> <span class="w"> </span><span class="nx">c2</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">database</span><span class="p">.</span><span class="nx">newConnection</span><span class="p">()</span> <span class="w"> </span><span class="nx">c2</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;begin&quot;</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span> <span class="w"> </span><span class="nx">c1</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;set&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">&quot;x&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;hey&quot;</span><span class="p">})</span> <span class="w"> </span><span class="c1">// Update is visible to self.</span> <span class="w"> </span><span class="nx">res</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c1</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;get&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">&quot;x&quot;</span><span class="p">})</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;hey&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;c1 get x&quot;</span><span class="p">)</span> <span class="w"> </span><span class="c1">// But since read uncommitted, also available to everyone else.</span> <span class="w"> </span><span class="nx">res</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c2</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;get&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">&quot;x&quot;</span><span class="p">})</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;hey&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;c2 get x&quot;</span><span class="p">)</span> <span class="w"> </span><span class="c1">// And if we delete, that should be respected.</span> <span class="w"> </span><span class="nx">res</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c1</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;delete&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">&quot;x&quot;</span><span class="p">})</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;c1 delete x&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c1</span><span class="p">.</span><span class="nx">execCommand</span><span class="p">(</span><span class="s">&quot;get&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">&quot;x&quot;</span><span class="p">})</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;c1 sees no x&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">.</span><span class="nx">Error</span><span class="p">(),</span><span class="w"> </span><span class="s">&quot;cannot get key that does not exist&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;c1 sees no x&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c2</span><span class="p">.</span><span class="nx">execCommand</span><span class="p">(</span><span class="s">&quot;get&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">&quot;x&quot;</span><span class="p">})</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;c2 sees no x&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">.</span><span class="nx">Error</span><span class="p">(),</span><span class="w"> </span><span class="s">&quot;cannot get key that does not exist&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;c2 sees no x&quot;</span><span class="p">)</span> <span class="p">}</span> </pre></div> <p class="note"> Thank you to @glaebhoerl for <a href="https://twitter.com/glaebhoerl/status/1792912649304388043">pointing out</a> that in an earlier version of this post, Read Uncommitted incorrectly made deleted values visible. </p><p>That's pretty simple! But also pretty useless if your workload has conflicts. If you can arrange your workload in a way where you know no concurrent transactions will ever read or write conflicting keys though, this could be pretty efficient! The rules will only get more complex (and thus potentially more of a bottleneck) from here on.</p> <p>But for the most part, people don't use this isolation level. SQLite, Yugabyte, Cockroach, and Postgres <a href="https://github.com/ept/hermitage?tab=readme-ov-file#summary-of-test-results">don't even</a> implement it. It is also not the default for any major database that does implement it.</p> <p>Let's get a little stricter.</p> <h4 id="read-committed">Read Committed</h4><p>We'll pull again from <a href="https://jepsen.io/consistency/models/read-committed">Jepsen</a>:</p> <blockquote><p>Read committed is a consistency model which strengthens read uncommitted by preventing dirty reads: transactions are not allowed to observe writes from transactions which do not commit.</p> </blockquote> <p>This sounds pretty simple. In <code>isvisible</code> we'll make sure that the value has a <code>txStartId</code> that is either this transaction or a transaction that has committed. Moreover we will now begin checking against <code>txEndId</code> to make sure the value wasn't deleted by any relevant transaction.</p> <div class="highlight"><pre><span></span><span class="w"> </span> return value.txEndId == 0 <span class="w"> </span> } <span class="gi">+ // Read Committed means we are allowed to read any values that</span> <span class="gi">+ // are committed at the point in time where we read.</span> <span class="gi">+ if t.isolation == ReadCommittedIsolation {</span> <span class="gi">+ // If the value was created by a transaction that is</span> <span class="gi">+ // not committed, and not this current transaction,</span> <span class="gi">+ // it&#39;s no good.</span> <span class="gi">+ if value.txStartId != t.id &amp;&amp;</span> <span class="gi">+ d.transactionState(value.txStartId).state != CommittedTransaction {</span> <span class="gi">+ return false</span> <span class="gi">+ }</span> <span class="gi">+</span> <span class="gi">+ // If the value was deleted in this transaction, it&#39;s no good.</span> <span class="gi">+ if value.txEndId == t.id {</span> <span class="gi">+ return false</span> <span class="gi">+ }</span> <span class="gi">+</span> <span class="gi">+ // Or if the value was deleted in some other committed</span> <span class="gi">+ // transaction, it&#39;s no good.</span> <span class="gi">+ if value.txEndId &gt; 0 &amp;&amp;</span> <span class="gi">+ d.transactionState(value.txEndId).state == CommittedTransaction {</span> <span class="gi">+ return false</span> <span class="gi">+ }</span> <span class="gi">+</span> <span class="gi">+ // Otherwise the value is good.</span> <span class="gi">+ return true</span> <span class="gi">+ }</span> <span class="gi">+</span> <span class="w"> </span> assert(false, &quot;unsupported isolation level&quot;) <span class="w"> </span> return false <span class="w"> </span>} </pre></div> <p>This begins to look useful! We will never read a value that isn't part of a committed transaction (or isn't part of our own transaction). Indeed this is the <a href="https://github.com/ept/hermitage">default</a> isolation level for many databases including Postgres, Yugabyte, Oracle, and SQL Server.</p> <p>Let's add a test to <code>main_test.go</code>. This is a bit long, but give it a slow read. It is thoroughly commented.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">TestReadCommitted</span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="o">*</span><span class="nx">testing</span><span class="p">.</span><span class="nx">T</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">database</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">newDatabase</span><span class="p">()</span> <span class="w"> </span><span class="nx">database</span><span class="p">.</span><span class="nx">defaultIsolation</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">ReadCommittedIsolation</span> <span class="w"> </span><span class="nx">c1</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">database</span><span class="p">.</span><span class="nx">newConnection</span><span class="p">()</span> <span class="w"> </span><span class="nx">c1</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;begin&quot;</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span> <span class="w"> </span><span class="nx">c2</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">database</span><span class="p">.</span><span class="nx">newConnection</span><span class="p">()</span> <span class="w"> </span><span class="nx">c2</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;begin&quot;</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span> <span class="w"> </span><span class="c1">// Local change is visible locally.</span> <span class="w"> </span><span class="nx">c1</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;set&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">&quot;x&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;hey&quot;</span><span class="p">})</span> <span class="w"> </span><span class="nx">res</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c1</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;get&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">&quot;x&quot;</span><span class="p">})</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;hey&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;c1 get x&quot;</span><span class="p">)</span> <span class="w"> </span><span class="c1">// Update not available to this transaction since this is not</span> <span class="w"> </span><span class="c1">// committed.</span> <span class="w"> </span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c2</span><span class="p">.</span><span class="nx">execCommand</span><span class="p">(</span><span class="s">&quot;get&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">&quot;x&quot;</span><span class="p">})</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;c2 get x&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">.</span><span class="nx">Error</span><span class="p">(),</span><span class="w"> </span><span class="s">&quot;cannot get key that does not exist&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;c2 get x&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">c1</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;commit&quot;</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span> <span class="w"> </span><span class="c1">// Now that it&#39;s been committed, it&#39;s visible in c2.</span> <span class="w"> </span><span class="nx">res</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c2</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;get&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">&quot;x&quot;</span><span class="p">})</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;hey&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;c2 get x&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">c3</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">database</span><span class="p">.</span><span class="nx">newConnection</span><span class="p">()</span> <span class="w"> </span><span class="nx">c3</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;begin&quot;</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span> <span class="w"> </span><span class="c1">// Local change is visible locally.</span> <span class="w"> </span><span class="nx">c3</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;set&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">&quot;x&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;yall&quot;</span><span class="p">})</span> <span class="w"> </span><span class="nx">res</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c3</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;get&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">&quot;x&quot;</span><span class="p">})</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;yall&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;c3 get x&quot;</span><span class="p">)</span> <span class="w"> </span><span class="c1">// But not on the other commit, again.</span> <span class="w"> </span><span class="nx">res</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c2</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;get&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">&quot;x&quot;</span><span class="p">})</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;hey&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;c2 get x&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">c3</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;abort&quot;</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span> <span class="w"> </span><span class="c1">// And still not, if the other transaction aborted.</span> <span class="w"> </span><span class="nx">res</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c2</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;get&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">&quot;x&quot;</span><span class="p">})</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;hey&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;c2 get x&quot;</span><span class="p">)</span> <span class="w"> </span><span class="c1">// And if we delete it, it should show up deleted locally.</span> <span class="w"> </span><span class="nx">c2</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;delete&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">&quot;x&quot;</span><span class="p">})</span> <span class="w"> </span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c2</span><span class="p">.</span><span class="nx">execCommand</span><span class="p">(</span><span class="s">&quot;get&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">&quot;x&quot;</span><span class="p">})</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;c2 get x&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">.</span><span class="nx">Error</span><span class="p">(),</span><span class="w"> </span><span class="s">&quot;cannot get key that does not exist&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;c2 get x&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">c2</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;commit&quot;</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span> <span class="w"> </span><span class="c1">// It should also show up as deleted in new transactions now</span> <span class="w"> </span><span class="c1">// that it has been committed.</span> <span class="w"> </span><span class="nx">c4</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">database</span><span class="p">.</span><span class="nx">newConnection</span><span class="p">()</span> <span class="w"> </span><span class="nx">c4</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;begin&quot;</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span> <span class="w"> </span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c4</span><span class="p">.</span><span class="nx">execCommand</span><span class="p">(</span><span class="s">&quot;get&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">&quot;x&quot;</span><span class="p">})</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;c4 get x&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">.</span><span class="nx">Error</span><span class="p">(),</span><span class="w"> </span><span class="s">&quot;cannot get key that does not exist&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;c4 get x&quot;</span><span class="p">)</span> <span class="p">}</span> </pre></div> <p>Again this seems great. However! You can easily get inconsistent data within a transaction at this isolation level. If the transaction A has multiple statements it can see different results per statement, even if the transaction A did not modify data. Another transaction B may have committed changes between two statements in this transaction A.</p> <p>Let's get a little stricter.</p> <h4 id="repeatable-read">Repeatable Read</h4><p>Again as Jepsen says, Repeatable Read is the same as Read Committed but with the following anomaly not allowed (quoting from the <a href="https://web.cecs.pdx.edu/~len/sql1999.pdf">ANSI SQL 1999 standard</a>):</p> <blockquote><p>P2 (“Non-repeatable read”): SQL-transaction T1 reads a row. SQL-transaction T2 then modifies or deletes that row and performs a COMMIT. If T1 then attempts to reread the row, it may receive the modified value or discover that the row has been deleted.</p> </blockquote> <p>To support this, we will add additional checks for the Read Committed logic that make sure the value was not created and not deleted within a transaction that started before this transaction started.</p> <p>As it happens, this is the same logic that will be necessary for Snapshot Isolation and Serializable Isolation. The additional logic (that makes Snapshot Isolation and Serializable Isolation different) happens at commit time.</p> <div class="highlight"><pre><span></span><span class="w"> </span> return true <span class="w"> </span> } <span class="gd">- assert(false, &quot;unsupported isolation level&quot;)</span> <span class="gd">- return false</span> <span class="gi">+ // Repeatable Read, Snapshot Isolation, and Serializable</span> <span class="gi">+ // further restricts Read Committed so only versions from</span> <span class="gi">+ // transactions that completed before this one started are</span> <span class="gi">+ // visible.</span> <span class="gi">+</span> <span class="gi">+ // Snapshot Isolation and Serializable will do additional</span> <span class="gi">+ // checks at commit time.</span> <span class="gi">+ assert(t.isolation == RepeatableReadIsolation ||</span> <span class="gi">+ t.isolation == SnapshotIsolation ||</span> <span class="gi">+ t.isolation == SerializableIsolation, &quot;invalid isolation level&quot;)</span> <span class="gi">+ // Ignore values from transactions started after this one.</span> <span class="gi">+ if value.txStartId &gt; t.id {</span> <span class="gi">+ return false</span> <span class="gi">+ }</span> <span class="gi">+</span> <span class="gi">+ // Ignore values created from transactions in progress when</span> <span class="gi">+ // this one started.</span> <span class="gi">+ if t.inprogress.Contains(value.txStartId) {</span> <span class="gi">+ return false</span> <span class="gi">+ }</span> <span class="gi">+</span> <span class="gi">+ // If the value was created by a transaction that is not</span> <span class="gi">+ // committed, and not this current transaction, it&#39;s no good.</span> <span class="gi">+ if d.transactionState(value.txStartId).state != CommittedTransaction &amp;&amp;</span> <span class="gi">+ value.txStartId != t.id {</span> <span class="gi">+ return false</span> <span class="gi">+ }</span> <span class="gi">+</span> <span class="gi">+ // If the value was deleted in this transaction, it&#39;s no good.</span> <span class="gi">+ if value.txEndId == t.id {</span> <span class="gi">+ return false</span> <span class="gi">+ }</span> <span class="gi">+</span> <span class="gi">+ // Or if the value was deleted in some other committed</span> <span class="gi">+ // transaction that started before this one, it&#39;s no good.</span> <span class="gi">+ if value.txEndId &lt; t.id &amp;&amp;</span> <span class="gi">+ value.txEndId &gt; 0 &amp;&amp;</span> <span class="gi">+ d.transactionState(value.txEndId).state == CommittedTransaction &amp;&amp;</span> <span class="gi">+ !t.inprogress.Contains(value.txEndId) {</span> <span class="gi">+ return false</span> <span class="gi">+ }</span> <span class="gi">+</span> <span class="gi">+ return true</span> <span class="w"> </span>} <span class="w"> </span>type Connection struct { </pre></div> <p>How do I derive these rules? Mostly by writing tests that should pass or fail and seeing what doesn't make sense. I tried to steal from existing projects but these rules were not so simple to discover. Which is part of what I hope makes this project particularly useful to look at.</p> <p>Let's write a test for Repeatable Read. Again, the test is long but well commented.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">TestRepeatableRead</span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="o">*</span><span class="nx">testing</span><span class="p">.</span><span class="nx">T</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">database</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">newDatabase</span><span class="p">()</span> <span class="w"> </span><span class="nx">database</span><span class="p">.</span><span class="nx">defaultIsolation</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">RepeatableReadIsolation</span> <span class="w"> </span><span class="nx">c1</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">database</span><span class="p">.</span><span class="nx">newConnection</span><span class="p">()</span> <span class="w"> </span><span class="nx">c1</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;begin&quot;</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span> <span class="w"> </span><span class="nx">c2</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">database</span><span class="p">.</span><span class="nx">newConnection</span><span class="p">()</span> <span class="w"> </span><span class="nx">c2</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;begin&quot;</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span> <span class="w"> </span><span class="c1">// Local change is visible locally.</span> <span class="w"> </span><span class="nx">c1</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;set&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">&quot;x&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;hey&quot;</span><span class="p">})</span> <span class="w"> </span><span class="nx">res</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c1</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;get&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">&quot;x&quot;</span><span class="p">})</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;hey&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;c1 get x&quot;</span><span class="p">)</span> <span class="w"> </span><span class="c1">// Update not available to this transaction since this is not</span> <span class="w"> </span><span class="c1">// committed.</span> <span class="w"> </span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c2</span><span class="p">.</span><span class="nx">execCommand</span><span class="p">(</span><span class="s">&quot;get&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">&quot;x&quot;</span><span class="p">})</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;c2 get x&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">.</span><span class="nx">Error</span><span class="p">(),</span><span class="w"> </span><span class="s">&quot;cannot get key that does not exist&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;c2 get x&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">c1</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;commit&quot;</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span> <span class="w"> </span><span class="c1">// Even after committing, it&#39;s not visible in an existing</span> <span class="w"> </span><span class="c1">// transaction.</span> <span class="w"> </span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c2</span><span class="p">.</span><span class="nx">execCommand</span><span class="p">(</span><span class="s">&quot;get&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">&quot;x&quot;</span><span class="p">})</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;c2 get x&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">.</span><span class="nx">Error</span><span class="p">(),</span><span class="w"> </span><span class="s">&quot;cannot get key that does not exist&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;c2 get x&quot;</span><span class="p">)</span> <span class="w"> </span><span class="c1">// But is available in a new transaction.</span> <span class="w"> </span><span class="nx">c3</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">database</span><span class="p">.</span><span class="nx">newConnection</span><span class="p">()</span> <span class="w"> </span><span class="nx">c3</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;begin&quot;</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span> <span class="w"> </span><span class="nx">res</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c3</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;get&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">&quot;x&quot;</span><span class="p">})</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;hey&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;c3 get x&quot;</span><span class="p">)</span> <span class="w"> </span><span class="c1">// Local change is visible locally.</span> <span class="w"> </span><span class="nx">c3</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;set&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">&quot;x&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;yall&quot;</span><span class="p">})</span> <span class="w"> </span><span class="nx">res</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c3</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;get&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">&quot;x&quot;</span><span class="p">})</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;yall&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;c3 get x&quot;</span><span class="p">)</span> <span class="w"> </span><span class="c1">// But not on the other commit, again.</span> <span class="w"> </span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c2</span><span class="p">.</span><span class="nx">execCommand</span><span class="p">(</span><span class="s">&quot;get&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">&quot;x&quot;</span><span class="p">})</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;c2 get x&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">.</span><span class="nx">Error</span><span class="p">(),</span><span class="w"> </span><span class="s">&quot;cannot get key that does not exist&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;c2 get x&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">c3</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;abort&quot;</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span> <span class="w"> </span><span class="c1">// And still not, regardless of abort, because it&#39;s an older</span> <span class="w"> </span><span class="c1">// transaction.</span> <span class="w"> </span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c2</span><span class="p">.</span><span class="nx">execCommand</span><span class="p">(</span><span class="s">&quot;get&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">&quot;x&quot;</span><span class="p">})</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;c2 get x&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">.</span><span class="nx">Error</span><span class="p">(),</span><span class="w"> </span><span class="s">&quot;cannot get key that does not exist&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;c2 get x&quot;</span><span class="p">)</span> <span class="w"> </span><span class="c1">// And again still the aborted set is still not on a new</span> <span class="w"> </span><span class="c1">// transaction.</span> <span class="w"> </span><span class="nx">c4</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">database</span><span class="p">.</span><span class="nx">newConnection</span><span class="p">()</span> <span class="w"> </span><span class="nx">res</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c4</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;begin&quot;</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span> <span class="w"> </span><span class="nx">res</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c4</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;get&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">&quot;x&quot;</span><span class="p">})</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;hey&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;c4 get x&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">c4</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;delete&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">&quot;x&quot;</span><span class="p">})</span> <span class="w"> </span><span class="nx">c4</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;commit&quot;</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span> <span class="w"> </span><span class="c1">// But the delete is visible to new transactions now that this</span> <span class="w"> </span><span class="c1">// has been committed.</span> <span class="w"> </span><span class="nx">c5</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">database</span><span class="p">.</span><span class="nx">newConnection</span><span class="p">()</span> <span class="w"> </span><span class="nx">res</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c5</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;begin&quot;</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span> <span class="w"> </span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c5</span><span class="p">.</span><span class="nx">execCommand</span><span class="p">(</span><span class="s">&quot;get&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">&quot;x&quot;</span><span class="p">})</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;c5 get x&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">.</span><span class="nx">Error</span><span class="p">(),</span><span class="w"> </span><span class="s">&quot;cannot get key that does not exist&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;c5 get x&quot;</span><span class="p">)</span> <span class="p">}</span> </pre></div> <p>Let's get stricter!</p> <h4 id="snapshot-isolation">Snapshot Isolation</h4><p>Back to [Jepsen](<a href="https://jepsen.io/consistency/models/snapshot-isolation">https://jepsen.io/consistency/models/snapshot-isolation</a> for a definition:</p> <blockquote><p>In a snapshot isolated system, each transaction appears to operate on an independent, consistent snapshot of the database. Its changes are visible only to that transaction until commit time, when all changes become visible atomically to any transaction which begins at a later time. If transaction T1 has modified an object x, and another transaction T2 committed a write to x after T1’s snapshot began, and before T1’s commit, then T1 must abort.</p> </blockquote> <p>So Snapshot Isolation is the same as Repeatable Read but with one additional rule: the keys written by any two concurrent committed transactions must not overlap.</p> <p>This is why we tracked <code>writeset</code>. Every time a transaction modified or deleted a key, we added it to the transaction's <code>writeset</code>. To make sure we abort correctly, we'll add a conflict check to the commit step. (This idea is also well documented in <a href="https://dl.acm.org/doi/abs/10.1145/2168836.2168853">A critique of snapshot isolation</a>. This paper can be hard to find. Email me if you want a copy.)</p> <p>When a transaction A goes to commit, it will run a conflict test for any transaction B that has committed since this transaction A started.</p> <p>Serializable Isolation is going to have a similar check. So we'll add a helper for iterating through all relevant transactions, running a check function for any transaction that has committed.</p> <div class="highlight"><pre><span></span>func (d *Database) hasConflict(t1 *Transaction, conflictFn func(*Transaction, *Transaction) bool) bool { <span class="w"> </span> iter := d.transactions.Iter() <span class="w"> </span> // First see if there is any conflict with transactions that <span class="w"> </span> // were in progress when this one started. <span class="w"> </span> inprogressIter := t1.inprogress.Iter() <span class="w"> </span> for ok := inprogressIter.First(); ok; ok = inprogressIter.Next() { <span class="w"> </span> id := inprogressIter.Key() <span class="w"> </span> found := iter.Seek(id) <span class="w"> </span> if !found { <span class="w"> </span> continue <span class="w"> </span> } <span class="w"> </span> t2 := iter.Value() <span class="w"> </span> if t2.state == CommittedTransaction { <span class="w"> </span> if conflictFn(t1, &amp;t2) { <span class="w"> </span> return true <span class="w"> </span> } <span class="w"> </span> } <span class="w"> </span> } <span class="w"> </span> // Then see if there is any conflict with transactions that <span class="w"> </span> // started and committed after this one started. <span class="w"> </span> for id := t1.id; id &lt; d.nextTransactionId; id++ { <span class="w"> </span> found := iter.Seek(id) <span class="w"> </span> if !found { <span class="w"> </span> continue <span class="w"> </span> } <span class="w"> </span> t2 := iter.Value() <span class="w"> </span> if t2.state == CommittedTransaction { <span class="w"> </span> if conflictFn(t1, &amp;t2) { <span class="w"> </span> return true <span class="w"> </span> } <span class="w"> </span> } <span class="w"> </span> } <span class="w"> </span> return false } </pre></div> <p>It was around this point that I decided I did really need a B-Tree implementation and could not just stick to vanilla Go data structures.</p> <p>Now we can modify <code>completeTransaction</code> to do this check if the transaction intends to commit. If the current transaction A's write set intersects with any other transaction B committed since transaction A started, we must abort.</p> <div class="highlight"><pre><span></span><span class="w"> </span>func (d *Database) completeTransaction(t *Transaction, state TransactionState) error { <span class="w"> </span> debug(&quot;completing transaction &quot;, t.id) <span class="gi">+</span> <span class="gi">+ if state == CommittedTransaction {</span> <span class="gi">+ // Snapshot Isolation imposes the additional constraint that</span> <span class="gi">+ // no transaction A may commit after writing any of the same</span> <span class="gi">+ // keys as transaction B has written and committed during</span> <span class="gi">+ // transaction A&#39;s life.</span> <span class="gi">+ if t.isolation == SnapshotIsolation &amp;&amp; d.hasConflict(t, func(t1 *Transaction, t2 *Transaction) bool {</span> <span class="gi">+ return setsShareItem(t1.writeset, t2.writeset)</span> <span class="gi">+ }) {</span> <span class="gi">+ d.completeTransaction(t, AbortedTransaction)</span> <span class="gi">+ return fmt.Errorf(&quot;write-write conflict&quot;)</span> <span class="gi">+ }</span> <span class="gi">+ }</span> <span class="gi">+</span> <span class="w"> </span> // Update transactions. <span class="w"> </span> t.state = state <span class="w"> </span> d.transactions.Set(t.id, *t) </pre></div> <p>Lastly, the definition of <code>setsShareItem</code>.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">setsShareItem</span><span class="p">(</span><span class="nx">s1</span><span class="w"> </span><span class="nx">btree</span><span class="p">.</span><span class="nx">Set</span><span class="p">[</span><span class="kt">string</span><span class="p">],</span><span class="w"> </span><span class="nx">s2</span><span class="w"> </span><span class="nx">btree</span><span class="p">.</span><span class="nx">Set</span><span class="p">[</span><span class="kt">string</span><span class="p">])</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s1Iter</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s1</span><span class="p">.</span><span class="nx">Iter</span><span class="p">()</span> <span class="w"> </span><span class="nx">s2Iter</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s2</span><span class="p">.</span><span class="nx">Iter</span><span class="p">()</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s1Iter</span><span class="p">.</span><span class="nx">First</span><span class="p">();</span><span class="w"> </span><span class="nx">ok</span><span class="p">;</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">s1Iter</span><span class="p">.</span><span class="nx">Next</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s1Key</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s1Iter</span><span class="p">.</span><span class="nx">Key</span><span class="p">()</span> <span class="w"> </span><span class="nx">found</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s2Iter</span><span class="p">.</span><span class="nx">Seek</span><span class="p">(</span><span class="nx">s1Key</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">found</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">true</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">false</span> <span class="p">}</span> </pre></div> <p>Since Snapshot Isolation shares all the same visibility rules as Repeatable Read, the tests get to be a little simpler! We'll simply test that two transactions attempting to commit a write to the same key fail. Or specifically: that the second transaction cannot commit.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">TestSnapshotIsolation_writewrite_conflict</span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="o">*</span><span class="nx">testing</span><span class="p">.</span><span class="nx">T</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">database</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">newDatabase</span><span class="p">()</span> <span class="w"> </span><span class="nx">database</span><span class="p">.</span><span class="nx">defaultIsolation</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">SnapshotIsolation</span> <span class="w"> </span><span class="nx">c1</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">database</span><span class="p">.</span><span class="nx">newConnection</span><span class="p">()</span> <span class="w"> </span><span class="nx">c1</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;begin&quot;</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span> <span class="w"> </span><span class="nx">c2</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">database</span><span class="p">.</span><span class="nx">newConnection</span><span class="p">()</span> <span class="w"> </span><span class="nx">c2</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;begin&quot;</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span> <span class="w"> </span><span class="nx">c3</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">database</span><span class="p">.</span><span class="nx">newConnection</span><span class="p">()</span> <span class="w"> </span><span class="nx">c3</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;begin&quot;</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span> <span class="w"> </span><span class="nx">c1</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;set&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">&quot;x&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;hey&quot;</span><span class="p">})</span> <span class="w"> </span><span class="nx">c1</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;commit&quot;</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span> <span class="w"> </span><span class="nx">c2</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;set&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">&quot;x&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;hey&quot;</span><span class="p">})</span> <span class="w"> </span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c2</span><span class="p">.</span><span class="nx">execCommand</span><span class="p">(</span><span class="s">&quot;commit&quot;</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;c2 commit&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">.</span><span class="nx">Error</span><span class="p">(),</span><span class="w"> </span><span class="s">&quot;write-write conflict&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;c2 commit&quot;</span><span class="p">)</span> <span class="w"> </span><span class="c1">// But unrelated keys cause no conflict.</span> <span class="w"> </span><span class="nx">c3</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;set&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">&quot;y&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;no conflict&quot;</span><span class="p">})</span> <span class="w"> </span><span class="nx">c3</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;commit&quot;</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span> <span class="p">}</span> </pre></div> <p>Not bad! But let's get stricter.</p> <p class="note note--edit"> Upon further discussion with Alex Miller, and after reviewing <a href="https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-95-51.pdf">A Critique of ANSI SQL Isolation Levels</a>, the difference I am trying to suggest (between Repeatable Read an Snapshot Isolation) likely does not exist. A Critique of ANSI SQL Isolation Levels mentions Repeatable Read must not exhibit P4 (Lost Update) anomalies. And it mentions that you must check for read-write conflicts to avoid these. Therefore it seems likely that you can't easily separate Repeatable Read from Snapshot Isolation when implemented using MVCC. The differences between Repeatable Read and Snapshot Isolation may more readily show up when implementing transactions the classical way with Two-Phase Locking. <br /> <br /> To reiterate, with MVCC and optimistic concurrency control, correct implementations of Repeatable Read and Snapshot Isolation do not seem to be distinguishable. Both require write-write conflict detection. </p><h4 id="serializable-isolation">Serializable Isolation</h4><p>In terms of end-result, this is the simplest isolation level to reason about. Serializable Isolation must appear as if only a single transaction were executing at a time. Some systems, like SQLite and TigerBeetle, do Actually Serial Execution where only one transaction runs at a time. But few databases implement Serializable like this because it removes a number of fair concurrent execution histories. For example, two concurrent read-only transactions.</p> <p>Postgres implements serializability via <a href="https://drkp.net/papers/ssi-vldb12.pdf">Serializable Snapshot Isolation</a>. MySQL implements serializability via <a href="https://distributed-computing-musings.com/2022/02/transactions-two-phase-locking/">Two-Phase Locking</a>. FoundationDB implements serializability via <a href="https://apple.github.io/foundationdb/developer-guide.html">sequential timestamp assignment and conflict detection</a>.</p> <p>But the paper, <a href="https://dl.acm.org/doi/abs/10.1145/2168836.2168853">A critique of snapshot isolation</a>, provides a simple (though not necessarily efficient; I have no clue) approach via what they call Write Snapshot Isolation. In their algorithm, if any two transactions read and write set intersect (but not write and write set intersect), the transaction should be aborted. And this (plus Repeatable Read rules) is sufficient for Serializability.</p> <p>I leave it to that paper for the proof of correctness. In terms of implementing it though it's quite simple and very similar to the Snapshot Isolation we already mentioned.</p> <p>Inside of <code>completeTransaction</code> add:</p> <div class="highlight"><pre><span></span><span class="w"> </span> }) { <span class="w"> </span> d.completeTransaction(t, AbortedTransaction) <span class="w"> </span> return fmt.Errorf(&quot;write-write conflict&quot;) <span class="gi">+ }</span> <span class="gi">+</span> <span class="gi">+ // Serializable Isolation imposes the additional constraint that</span> <span class="gi">+ // no transaction A may commit after reading any of the same</span> <span class="gi">+ // keys as transaction B has written and committed during</span> <span class="gi">+ // transaction A&#39;s life, or vice-versa.</span> <span class="gi">+ if t.isolation == SerializableIsolation &amp;&amp; d.hasConflict(t, func(t1 *Transaction, t2 *Transaction) bool {</span> <span class="gi">+ return setsShareItem(t1.readset, t2.writeset) ||</span> <span class="gi">+ setsShareItem(t1.writeset, t2.readset)</span> <span class="gi">+ }) {</span> <span class="gi">+ d.completeTransaction(t, AbortedTransaction)</span> <span class="gi">+ return fmt.Errorf(&quot;read-write conflict&quot;)</span> <span class="w"> </span> } <span class="w"> </span> } </pre></div> <p>And if we add a test for read-write conflicts:</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">TestSerializableIsolation_readwrite_conflict</span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="o">*</span><span class="nx">testing</span><span class="p">.</span><span class="nx">T</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">database</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">newDatabase</span><span class="p">()</span> <span class="w"> </span><span class="nx">database</span><span class="p">.</span><span class="nx">defaultIsolation</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">SerializableIsolation</span> <span class="w"> </span><span class="nx">c1</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">database</span><span class="p">.</span><span class="nx">newConnection</span><span class="p">()</span> <span class="w"> </span><span class="nx">c1</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;begin&quot;</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span> <span class="w"> </span><span class="nx">c2</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">database</span><span class="p">.</span><span class="nx">newConnection</span><span class="p">()</span> <span class="w"> </span><span class="nx">c2</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;begin&quot;</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span> <span class="w"> </span><span class="nx">c3</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">database</span><span class="p">.</span><span class="nx">newConnection</span><span class="p">()</span> <span class="w"> </span><span class="nx">c3</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;begin&quot;</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span> <span class="w"> </span><span class="nx">c1</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;set&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">&quot;x&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;hey&quot;</span><span class="p">})</span> <span class="w"> </span><span class="nx">c1</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;commit&quot;</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span> <span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c2</span><span class="p">.</span><span class="nx">execCommand</span><span class="p">(</span><span class="s">&quot;get&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">&quot;x&quot;</span><span class="p">})</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">.</span><span class="nx">Error</span><span class="p">(),</span><span class="w"> </span><span class="s">&quot;cannot get key that does not exist&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;c5 get x&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c2</span><span class="p">.</span><span class="nx">execCommand</span><span class="p">(</span><span class="s">&quot;commit&quot;</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;c2 commit&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">assertEq</span><span class="p">(</span><span class="nx">err</span><span class="p">.</span><span class="nx">Error</span><span class="p">(),</span><span class="w"> </span><span class="s">&quot;read-write conflict&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;c2 commit&quot;</span><span class="p">)</span> <span class="w"> </span><span class="c1">// But unrelated keys cause no conflict.</span> <span class="w"> </span><span class="nx">c3</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;set&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{</span><span class="s">&quot;y&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;no conflict&quot;</span><span class="p">})</span> <span class="w"> </span><span class="nx">c3</span><span class="p">.</span><span class="nx">mustExecCommand</span><span class="p">(</span><span class="s">&quot;commit&quot;</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span> <span class="p">}</span> </pre></div> <p>We see it work! And that's it for a basic implementation of MVCC and major transaction isolation levels.</p> <h3 id="production-quality-testing">Production-quality testing</h3><p>There are two major projects I'm aware of that help you test transaction implementations: <a href="https://github.com/jepsen-io/elle">Elle</a> and <a href="https://github.com/ept/hermitage">Hermitage</a>. These are probably where I'd go looking if I were implementing this for real.</p> <p>This project took me long enough on its own and I felt reasonably comfortable with my tests that the gist of my logic was right that I did not test further. For that reason it surely has bugs.</p> <h3 id="vacuuming-and-cleanup">Vacuuming and cleanup</h3><p>One of the major things this implementation does not do is cleaning up old data. Eventually, older versions of values will be required by no transactions. They should be removed from the value version array. Similarly, eventually older transactions will be required by no transactions. They should be removed from the database transaction history list.</p> <p>Even if we had the vacuuming process in place though, what about some extreme use patterns. What if a key's value was always going to be 1GB long. And what if multiple transactions made only small changes to the 1GB data. We'd be duplicating a lot of the value across versions.</p> <p>It sounds less extreme when thinking about storing rows of data rather than key-value data. If a user has 100 columns and only updates one column a number of times, in our scheme we'd end up storing a ton of duplicate cell data for a row.</p> <p>This is a real-world issue in Postgres that was <a href="https://ottertune.com/blog/the-part-of-postgresql-we-hate-the-most">called out</a> by Andy Pavlo and the Ottertune folks. It turns out that Postgres alone among major databases stores the entire value for every version. In contrast other major databases like MySQL store a diff.</p> <h3 id="conclusion">Conclusion</h3><p>This post only begins to demonstrate that database behavior differs quite a bit both in terms of results and in terms of optimizations. Everyone implements the ideas differently and to varying degrees.</p> <p>Moreover, we have only begun to implement the behavior a real SQL database supports. For example, how do visibility rules and conflict detection work with range queries? What about sub-transactions, and save points? These will have to be covered another time.</p> <p>Hopefully seeing this simple implementation of MVCC and visibility rules helps to clarify at least some of the basics.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Here&#39;s a new post walking through an implementation of MVCC and major SQL transaction isolation levels, in 400 lines of Go code.<br><br>These ideas might sound esoteric, but they impact almost every developer using any database.<a href="https://t.co/crFKM74R5h">https://t.co/crFKM74R5h</a> <a href="https://t.co/o9awTPpvvx">pic.twitter.com/o9awTPpvvx</a></p>&mdash; Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1791225675287867742?ref_src=twsrc%5Etfw">May 16, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/2024-05-16-mvcc.htmlThu, 16 May 2024 00:00:00 +0000What makes a great technical bloghttp://notes.eatonphil.com/2024-04-10-what-makes-a-great-tech-blog.html<p>I want to explain why the blogs in <a href="https://lists.eatonphil.com/blogs.html">My favorite technical blogs</a> are my favorite. That page is solely about non-corporate tech blogs. So this post is too. I'll have to make another list for favorite corporate tech blogs.</p> <p>In short, they:</p> <ul> <li>Tackle hard and confusing topics</li> <li>Show working code</li> <li>Make things simpler</li> <li>Write regularly</li> <li>Talk about tradeoffs and downsides</li> <li>Avoid internet slang, memes, swearing, sarcasm, and ranting</li> </ul> <h3 id="tackle-hard-and-confusing-topics">Tackle hard and confusing topics</h3><p>There are a number of problems in programming and computer science where otherwise knowledgeable programmers have to start mumbling about, or revert to cliches or group-think, because they aren't sure.</p> <p>These are the best topics you can possibly dive deep into. And my favorite writers do exactly this.</p> <p>They write about durability guarantees of disks and filesystems. They write about common pitfalls in benchmarking. They write about database consistency anomalies. They write about threading and IO models.</p> <p>And they write about it by showing concrete examples and concrete logic so you can learn how to stop handwaving on the topic.</p> <p>Their writing helps you come out with a useful mental model you can apply to your own problems.</p> <p>And you know, sometimes it's not about the topic being obscure. Good writers have the ability to tackle a boring topic in an interesting light. Maybe by digging deeper into a root cause. Or showing you the history behind the scenes.</p> <p>Moreover, my favorite writers don't know everything. But they also don't pretend to know everything. They're quick to admit they don't understand something and ask for help from their readers.</p> <h3 id="show-working-code">Show working code</h3><p>I love to see complete working code in a post. In contrast there are many projects that start out simple and people write an article that covers the project at a high level. But they keep working on the project and it becomes more complex.</p> <p>It's not always easy to follow commits over time.</p> <p>Eli Bendersky and Serge Zaitsev are particularly great at developing small but meaningful projects in a single post or short series.</p> <p>On the other hand, if people only did this, we wouldn't hear about the development of long-running projects like V8 or Postgres. So I guess this style has limits. And I don't penalize people talking about long-running projects for not showing working code.</p> <h3 id="make-things-simpler">Make things simpler</h3><p>One of the marks of a good writer is that you can make complex topics simple. And not just by being reductive. Though sometimes even being reductive is useful for education.</p> <p>In contrast I sometimes see articles by less experienced writers and I marvel how they make a simple topic so complex. I recognize this because I was <em>absolutely</em> like that 10 years ago, if not 5 years ago.</p> <h3 id="write-regularly">Write regularly</h3><p>My favorite blogs typically get a new post at least once a month. Some people, like Murat, write once a week.</p> <p>I think the practice probably does improve your writing but mostly it's that they keep my attention by publishing regularly!</p> <h3 id="talk-about-tradeoffs-and-downsides">Talk about tradeoffs and downsides</h3><p>Nothing builds trust like talking about the issues with something you built. No project is perfect. And to ignore the downsides risks seeming like you don't know or understand them.</p> <p>So the writers I like the most talk about decisions in context. They talk about the good and the bad.</p> <h3 id="avoid-internet-slang,-memes,-swearing,-sarcasm,-and-ranting">Avoid internet slang, memes, swearing, sarcasm, and ranting</h3><p>There's no way I can think of talking about this without sounding super lame.</p> <p>One thing I've noticed, particularly among younger colleagues, is the use of memes or swearing or using 4chan slang or using sarcasm. I used to write like this 10 years ago too.</p> <p>There is a chunk of your audience who won't care. The problem is that there's also a chunk of your (potential) audience who definitely does care. There's even a chunk of your audience who may not care but just won't understand (i.e. non-native English speakers).</p> <p>I have friends and folks I respect who write very well. But that also are also overly, unnecessarily edgy when they write. I don't like sharing these posts because I don't want to unnecessarily offend or turn off people either.</p> <h3 id="closing-thoughts">Closing thoughts</h3><p>It would be boring if everyone wrote the same way. I'm glad the internet is fun and weird. But I wanted to share a few things that go into my favorite technical blogs that I'm always happy to refer people to.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I wrote a short post on what I think makes a great technical blog.<a href="https://t.co/QRFtQyQyU5">https://t.co/QRFtQyQyU5</a> <a href="https://t.co/QpsQC90EX5">pic.twitter.com/QpsQC90EX5</a></p>&mdash; Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1778184061447774328?ref_src=twsrc%5Etfw">April 10, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/2024-04-10-what-makes-a-great-tech-blog.htmlWed, 10 Apr 2024 00:00:00 +0000A paper reading club at work; databases and distributed systems researchhttp://notes.eatonphil.com/2024-04-05-company-paper-club.html<p>I started a paper reading club this week at work, focused on databases and distributed systems research. I posted in a general channel about the premise and asked if anyone was interested. I clarified the intended topic and that discussions would be asynchronous over email, run fortnightly.</p> <p>Weekly seemed way too often. A month seemed way too infrequent. Fortnightly seemed decent.</p> <p>I was nervous to do this because I've been here about 2 months. In the past I would have waited 6 months or a year to do this. But I don't know. If you see something you think should exist, why wait?</p> <p>The only other consideration was <a href="https://notes.eatonphil.com/eight-years-of-tech-meetups.html">past experiences I've written about</a> having difficulty getting engagement with clubs at work. But EDB has near 1,000 employees. I figured there might at least be a couple interested.</p> <p>Furthermore I figured if I only got a few people this entire idea would at least benefit myself, since I have been wanting to force myself to build a paper reading habit. And if no one responded, it would be only mildly embarassing and I'd not pursue it further.</p> <p>But after a day, about 6 people showed interest. Which was better than I hoped! Folks from product management, support, development, and beyond.</p> <p>So I opened a dedicated channel and asked people to start submitting papers and voting on them. One of my teammates started submitting some great papers on caches and reference counting.</p> <p>I picked a first one, the Redshift paper, to get us started. Demonstrating the process to avoid confusion. And I made a calendar invite for everyone in the channel, the paper linked in the invite. I clarified in the invite that it was just a reminder and that the real discussion would still be async over email. (I've found it's best to repeatedly clarify process stuff.)</p> <p>Once I had these first few folks interested I was able to post again in a broader company channel that a couple of us were starting this paper club. By the end of the day the dedicated channel was 29 folks. All in about 2 days.</p> <p>Mailing lists are nicer than Slack or Discord in my opinion because they sort of force you to slow down, they are harder to miss (if someone starts a thread after you've seen a message in Slack or Discord, you tend to miss it), and easier to manage (read/unread).</p> <p>Engineers often seem to get overwhelmed by a mass of Slack messages. Whereas they seem to be a bit more comfortable with email threads.</p> <p>All of this is all the more important when you're running a global group. EDB has people everywhere.</p> <p>Why do this?</p> <p>Before I dropped out of college I did a research internship with a VLSI group at Harvard SEAS. And my favorite part was that they had a weekly (or biweekly?) Wednesday paper reading session where 15 people from the lab and adjacent labs would eat pizza after hours and discuss a paper.</p> <p>I've been dying to recreate this at a company ever since. Since EDB is so distributed, we won't be discussing over pizza. But I'm still excited.</p> <p>And I hope my experience serves as a blueprint for others.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I started a paper reading club at work, wrote about it as a possible blueprint for others.<br><br>I&#39;m excited! I&#39;ve wanted to have a gang at work with whom to read papers for a long time.<a href="https://t.co/vpwERj8pHe">https://t.co/vpwERj8pHe</a></p>&mdash; Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1776415593173782938?ref_src=twsrc%5Etfw">April 6, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/2024-04-05-company-paper-club.htmlFri, 05 Apr 2024 00:00:00 +0000Finding memory leaks in Postgres C codehttp://notes.eatonphil.com/2024-03-27-finding-memory-leaks-in-postgres-c-code.html<head> <meta http-equiv="refresh" content="4;URL='https://www.enterprisedb.com/blog/finding-memory-leaks-postgres-c-code'" /> </head><p>This is an external post of mine. Click <a href="https://www.enterprisedb.com/blog/finding-memory-leaks-postgres-c-code">here</a> if you are not redirected.</p> http://notes.eatonphil.com/2024-03-27-finding-memory-leaks-in-postgres-c-code.htmlWed, 27 Mar 2024 00:00:00 +0000Zig, Rust, and other languageshttp://notes.eatonphil.com/2024-03-15-zig-rust-and-other-languages.html<p>Having worked a bit in Zig, Rust, Go and now C, I think there are a few common topics worth having a fresh conversation on: automatic memory management, the standard library, and explicit allocation.</p> <p>Zig is not a mature language. But it has made enough useful choices for a number of companies to invest in it and run it in production. The useful choices make Zig worth talking about.</p> <p>Go and Rust are mature languages. But they have both made questionable choices that seem worth talking about.</p> <p>All of these languages are developed by highly intelligent folks I personally look up to. And your choice to use any one of these is certainly fine, whichever it is.</p> <p>The positive and negative choices particular languages made, though, are worth talking about as we consider what a systems programming language 10 years from now would look like. Or how these languages themselves might evolve in the next 10 years.</p> <p>My perspective is mostly building distributed databases. So the points that I bring up may have no relevance to the kind of work you do, and that's alright. Moreover, I'm already aware most of these opinions are not shared by the language maintainers, and that's ok too. I am not writing to convince anyone.</p> <h3 id="automatic-memory-management">Automatic memory management</h3><p>One of my bigger issues with Zig is that it doesn't support RAII. You can defer cleanup to the end of a block; and this is half of the problem. But only RAII will allow for smart pointers and automatic (not manual) reference counting. RAII is an excellent option to default to, but in Zig you aren't allowed to. In contrast, even C "supports" automatic cleanup (via compiler extensions).</p> <p>But most of the time, arenas are fine. Postgres is written in C and memory is almost entirely managed through nested arenas (called "memory contexts") that get cleaned up when some subset of a task finishes, recursively. Zig has builtin support for arenas, which is great.</p> <h3 id="standard-library">Standard library</h3><p>It seems regrettable that some languages have been shipping smaller standard libraries. Smaller standard libraries seem to encourage users of the language to install more transitively-unvetted third-party libraries, which increases build time and build flakiness, and which increases bitrot over time as unnecessary breaking changes occur.</p> <p>People have been making jokes about <code>node_modules</code> for a decade now, but this problem is just as bad in Rust codebases I've seen. And to a degree it happens in Java and Go as well, though their larger standard libraries allow you to get further without dependencies.</p> <p>Zig has a good standard library, which may be Go and Java tier in a few years. But one goal of their package manager seemed to be to allow the standard library to be broken up; made smaller. For example, JSON support moving out of the standard library into a package. I don't know if that is actually the planned direction. I hope not.</p> <p>Having a large standard library doesn't mean that the programmer shouldn't be able to swap out implementations easily as needed. But all that is required is for the standard library to define an <strong>interface</strong> along with the standard library implementation.</p> <p>The small size of the standard library doesn't just affect developers using the language, it even encourages developers of the language itself to depend on libraries owned by individuals.</p> <p>Take a look at the transitive dependencies of an official Node.js package like <a href="https://github.com/nodejs/node-gyp/blob/main/package.json#L25">node-gyp</a>. Is it really the ideal outcome of a small standard library to encourage dependence in official libraries on libraries owned by individuals, like <a href="https://github.com/sindresorhus/env-paths">env-paths</a>, that haven't been modified in 3 years? 68 lines of code. Is it not safer at this point to vendor that code? i.e. copy the <code>env-paths</code> code into <code>node-gyp</code>.</p> <p>Similarly, if you go looking for compression support in Rust, there's none in the standard library. But you may notice the <a href="https://github.com/rust-lang/flate2-rs">flate2-rs</a> repo under the official <a href="https://github.com/rust-lang">rust-lang</a> GitHub namespace. If you look at its transitive dependencies: <a href="https://github.com/rust-lang/flate2-rs/blob/main/Cargo.toml#L23">flate2-rs</a> depends on (an individual's) <a href="https://github.com/Frommi/miniz_oxide/blob/master/miniz_oxide/Cargo.toml#L20">miniz_oxide</a> which depends on (an individual's) <a href="https://github.com/jonas-schievink/adler">adler</a> that hasn't been updated in 4 years. 300 lines of code including tests. Why not vendor this code? It's the habits a small standard library builds that seem to encourage everyone not to.</p> <p>I don't mean these necessarily constitute a supply-chain risk. I'm not talking about <a href="https://www.theregister.com/2016/03/23/npm_left_pad_chaos/">left-pad</a>. But the pattern is sort of clear. Even official packages may end up depending on external party packages, because the commitment to a small standard library meant omitting stuff like compression, checksums, and common OS paths.</p> <p>It's a tradeoff and maybe makes the job of the standard library maintainer easier. But I don't think this is the ideal situation. Dependencies are useful but should be kept to a reasonable minimum.</p> <p>Hopefully languages end up more like Go than like Rust in this regard.</p> <h3 id="explicit-allocation">Explicit allocation</h3><p>When folk discuss the Zig standard library's pattern of requiring an allocator argument for every method that allocates, they often talk about the benefit of swapping out allocators or the benefit of being able to handle OOM failures.</p> <p>Both of these seem pretty niche to me. For example, in Zig tests you are encouraged to pass around a debug allocator that tells you about memory leaks. But this doesn't seem too different from compiling a C project with a debug allocator or compiling with different sanitizers on and running tests against the binary produced. In both cases you mostly deal with allocators at a global level depending on the environment you're running the code in (production or tests).</p> <p>The real benefit of explicit allocations to me is much more trivial. You basically can't code a method in Zig without acknowledging allocations.</p> <p>This is particularly useful for hotpath code. Take an iterator for example. It has a <code>new()</code> method, a <code>next()</code> method, and a <code>done()</code> method. In most languages, it's basically impossible at the syntax or compiler-level to know if you are allocating in the <code>next()</code> method. You may know because you know the behavior of all the code in <code>next()</code> by heart. But that won't happen all the time.</p> <p>Zig is practically alone in that if you write the <code>next()</code> method and and don't pass an allocator to any method in the <code>next()</code> body, nothing in that <code>next()</code> method will allocate.</p> <p>In any other language it might not be until you run a profiler that you notice an allocation that should have been done once in <code>new()</code> accidentally ended up in <code>next()</code> instead.</p> <p>On the other hand, for all the same reasons, writing Zig is kind of a pain because everything takes an allocator!</p> <p>Explicit allocation is not intrinsic to Zig, the language. It is a convention that is prevalent in the standard library. There is still a global allocator and any user of Zig could decide to use the global allocator. At which point you've got implicit allocation. So explicit allocation as a convention isn't a perfect solution.</p> <p>But it, by default, gives you a level of awareness of allocations you just can't get from typical Go or Rust or C code, depending on the project's practices. Perhaps it's possible to switch off the Go, Rust and C standard library and use one where all functions that allocate do require an allocator.</p> <p>But explicitly passing allocators is still sort of a visual hack.</p> <p>I think the ideal situation in the future will be that every language supports annotating blocks of code as <code>must-not-allocate</code> or something along those lines. Either the compiler will enforce this and fail if you seem to allocate in a block marked <code>must-not-allocate</code>, or it will panic during runtime so you can catch this in tests.</p> <p>This would be useful beyond static programming languages. It would be as interesting to annotate blocks in JavaScript or Python as <code>must-not-allocate</code> too.</p> <p>Otherwise the current state of things is that you'd normally configure this sort of thing at the global level. Saying "there must not be any allocations in this entire program" just doesn't seem as useful in general as being able to say "there must not be any allocations in this one block".</p> <h4 id="optional,-not-required,-allocator-arguments">Optional, not required, allocator arguments</h4><p>Rust has nascent support for passing an allocator to methods that allocate. But it's optional. From what I understand, C++ STL is like this too.</p> <p>These are both super useful for programming extensions. And it's one of the reasons I think Zig makes a ton of sense for Postgres extensions specifically. Because it was only and always ever built for running in an environment with someone else's allocator.</p> <h3 id="praise-for-zig,-rust,-and-go-tooling">Praise for Zig, Rust, and Go tooling</h3><p>All three of these have really great first-party tooling including build system, package management, test runners and formatters. The idea that the language should provide a great environment to code in (end-to-end) makes things simpler and nicer for programmers.</p> <h3 id="meandering-non-conclusion">Meandering non-conclusion</h3><p>Use the language you want to use. Zig and Rust are both nice alternatives to writing vanilla C.</p> <p>On the other hand, I've been pleasantly surprised writing Postgres C. How high level it is. It's almost a separate language since you're often dealing with user-facing constructs, like Postgres's Datum objects which represent what you might think of as a cell in a Postgres database. And you can use all the same functions provided for Postgres SQL for working with Datums, but from C.</p> <p>I've also been able work a bit on Postgres extensions in Rust with <a href="https://github.com/pgcentralfoundation/pgrx">pgrx</a> lately, which I hope to write about soon. And when I saw <a href="https://github.com/xataio/pgzx">pgzx</a> for writing Postgres extensions in Zig I was excited to spend some time with that too.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Wrote a post on my wishlist for Zig and Rust. Focused on automatic memory management, the standard library, and explicit allocation.<a href="https://t.co/dvynizU9V2">https://t.co/dvynizU9V2</a> <a href="https://t.co/iTXp5QVxj0">pic.twitter.com/iTXp5QVxj0</a></p>&mdash; Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1768725864923931033?ref_src=twsrc%5Etfw">March 15, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/2024-03-15-zig-rust-and-other-languages.htmlFri, 15 Mar 2024 00:00:00 +0000First month on a database teamhttp://notes.eatonphil.com/2024-03-11-first-month-on-a-database-team.html<p><!-- -*- mode: markdown -*- --></p> <p>A little over a month ago, I joined EnterpriseDB on a distributed Postgres product (<a href="https://enterprisedb.com/docs/pgd">PGD</a>). The process of onboarding myself has been pretty similar at each company in the last decade, though I think I've gotten better at it. The process is of course influenced by the team, and my coworkers have been excellent. Still, I wanted to share my thought process and personal strategies.</p> <h3 id="avoid,-at-first,-what-is-always-challenging">Avoid, at first, what is always challenging</h3><p>Trickier things at companies are the people, organization, and processes. What code exists? How does it work together? Who owns what? How can I find easy code issues to tackle? How do I know what's important (so I can avoid picking it up and becoming a bottleneck).</p> <p>But also, in the first few days or weeks you aren't exactly expected to contribute meaningfully to features or bugs. Your sprint contributions are not tracked too closely.</p> <p>The combination of 1) what to avoid and 2) the sprint-freedom-you-have leads to a few interesting and valuable areas to work on on your own: the build process, tests, running the software, and docs.</p> <p>But code need not be ignored either. Some frequent areas to get your first code contributions in include user configuration code, error messages, and stale code comments.</p> <p>What follows are some little 1st day, 1st week, 1st month projects I went through to bootstrap my understanding of the system.</p> <h3 id="build-process">Build process</h3><p>First off, where is the code and how do you build it? This requires you to have all the relevant dependencies. Much of my work is on a Postgres extension. This meant having a local Postgres development environment, having gcc, gmake (on mac), Perl, and so on. And furthermore, PGD is a pretty mature product so it supports building against multiple Postgres distributions. Can I build against all of them?</p> <p>The easiest situation is when there are instructions for all of this, linked directly from your main repo. When I started, the instructions did exist but in a variety of places. So over the first week I started collecting all of what I had learned about building the system, with dependencies, across distributions, and with various important flags (debug mode on, asserts enabled, etc.). I finished the first week by writing a little internal blog post called "Hacking on PGD".</p> <p>I hadn't yet figured out the team processes so I didn't want to bother anyone by trying to get this "blog post" committed anywhere yet as official internal documentation. Maybe there already was a good doc, I just hadn't noticed it yet. So I just published it in a private Confluence page and shared it in the private team slack. If anyone else benefited from it, great! Otherwise, I knew I'd want to refer back to it.</p> <p>This is an important attitude I think. It can be hard to tell what others will benefit from. If you get into the habit of writing things down internally for your own sake, but making it available internally, it is likely others will benefit from it too. This is something I've learned from years of blogging publicly outside of work.</p> <p>Moreover, the simple act of writing a good post creates yourself as something of an authority. This is useful for yourself if no one else.</p> <h4 id="writing-a-good-post">Writing a good post</h4><p>Let's get distracted here for a second. One of the most important things I think in documentation is documenting not just what does exist but what doesn't. If you had to take a path to get something to work, did you try other paths that didn't work? It can be extremely useful to figure out what <em>exactly</em> is required for something.</p> <p>Was there a flag that you tried to build with but you didn't try building without it? Well try again without it and make sure it was necessary. Was there some process you executed where the build succeeded but you can't remember if it was actually necessary for the build to succeed?</p> <p>It's difficult to explain why I think this sort of precision is useful but I'm pretty sure it is. Maybe because it builds the habit of not treating things as magic when you can avoid it. It builds the habit of asking questions (if only to yourself) to understand and not just to get by.</p> <h4 id="static-analysis?-dynamic-analysis?">Static analysis? Dynamic analysis?</h4><p>Going back to builds, another aspect to consider is static and dynamic analysis. Are there special steps to using gdb or valgrind or other analyzers? Are you using them already? Can you get them running locally? Has any of this been documented?</p> <p>Maybe the answer to all of those is yes, or maybe none of those are relevant but there are likely similar tools for your ecosystem. If analysis tools are relevant and no one has yet explored them, that's another very useful area to explore as a newcomer.</p> <h3 id="testing">Testing</h3><p>After I got the builds working, I felt the obvious next step was to run tests. But what tests exist? Are there unit tests? Integration tests? Anything else? Moreover, is there test coverage? I was certain I'd be able to find some low-hanging contributions to make if I could find some files with low test coverage.</p> <p>Alas, my certainty hit the wall in that there were in fact too many types of integration tests that all do provide coverage already. They just don't all <em>report</em> coverage.</p> <p>The easiest ways to report coverage (with gcov) were only reporting coverage for certain integration tests that we run locally. There are more integration tests run in cloud environments and getting coverage reports there to merge with my local coverage files would have required more knowledge of people and processes, areas that I wanted not to be forced to think about too quickly.</p> <p>So coverage wasn't a good route to go. But around this time, I noticed a ticket that asked for a simple change to user configuration code. I was able to make the change pretty quickly and wanted to add tests. We have our own test framework built on top of Postgres's powerful Perl test framework. But it was a little difficult to figure out how to use either of them.</p> <p>So I copied code from other tests and pared it down until I got the smallest version of test code I could get. This took maybe a day or two of tweaking lines and rerunning tests since I didn't understand everything that was/wasn't required. Also it's Perl and I've never written Perl before so that took a bit of time and ChatGPT. (Arrays, man.)</p> <p>In the end though I was able to collect my learnings into another internal confluence post just about how to write tests, how to debug tests, how to do common things within tests (for example, ensuring a Postgres log line was outputted), etc. I published this post as well and shared it in the team Slack.</p> <h3 id="running">Running</h3><p>I had PGD built locally and was able to run integration tests locally, but I still hadn't gotten a cluster running. Nor played with the eventual consistency demos I knew we supported. We had a great quickstart that ran through all the manual steps of getting a two-node cluster up. This was a distillation, for devs, of a more elaborate process we give to customers in a production-quality script.</p> <p>But I was looking for something in between a production-quality script and manually initializing a local cluster. And I also wanted to practice my understanding of our test process. So I ported our quickstart to our integration test framework and made a PR with this new test, eventually merging this into the repo. And I wrote a minimal Python script for bringing up a local cluster. I've got an open PR to add this script to the repo. Maybe I'll learn though that a simple script such as this does already exist, and that's fine!</p> <h3 id="docs">Docs</h3><p>The entire time, as I'd been trying to build and test and run PGD, I was trying to understand our terminology and architecture by going through our public docs. I had a lot of questions coming out of this I'd ask in the team channel.</p> <p>Not to toot my horn but I think it's somewhat of a superpower to be able/willing to ask "dumb questions" in a group setting. That's how I frame it anyway. "Dumb question: what does X mean in this paragraph?" Or, "dumb question: when we say performance improvement because of Y, what is the intuition here?" Because of the time spent here, I was able to make a few more docs contributions as I read through the docs as well.</p> <p>You have to balance where you ask your dumb questions though. Asking dumb questions to one person doesn't benefit the team. But asking dumb questions in too wide a group is sometimes bad politics. Asking "dumb questions" in front of your team seems to have the best bang for buck.</p> <p>But maybe the more important contributions were, when I got more comfortable with the team, proposing to merge my personal, internal Confluence blog posts into the repo as docs. I think in a number of cases, what I wrote about indeed hadn't been concisely collected before and thus was useful to have as team documentation.</p> <p>Even more challenging was trying to distill (a chunk of) the internal architecture. Only after following many varied internal docs and videos, and following through numerous code paths, was I able to propose an architecture diagram outlining major components and communication between them, with their differing formats (WAL records, internal enums, etc.) and means of communication (RPC, shared memory, etc.). This architecture diagram is still in review and may be totally off. But it's already helped at least me think about the system.</p> <p>In most cases this was all information that the team had already written or explained but just bringing it together and summarizing provided a different useful perspective I think. Even if none of the docs got merged it still helped to build my own understanding.</p> <h3 id="beyond-the-repo">Beyond the repo</h3><p>Learning the project is just one aspect of onboarding. Beyond that I join the #cats channel, the #dogs channel, found some fellow New Yorkers and opened a NYC channel, and tried to find Zoom-time with the various people I'd see hanging around common team Slack channels. Trying to meet not just devs but support folk, product managers, marketing folk, sales folk, and anyone else!</p> <p>Walking the line between scouring our docs and GitHub and Confluence and Jira on my own, and bugging people with my incessant questions.</p> <p>I've enjoyed my time at startups. I've been a dev, a manager, a founder, a cofounder. But I'm incredibly excited to be back, at a bigger company, full-time as a developer hacking on a database!</p> <p>And what about you? What do you do to onboard yourself at a new company or new project?</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I&#39;ve been having an absolute blast in my first month at EDB and I wanted to share a few of my strategies for onboarding myself on a database team. Strategies broadly applicable for devs on a new team/project.<a href="https://t.co/TS5qRLysuA">https://t.co/TS5qRLysuA</a> <a href="https://t.co/lvuxDBQJwx">pic.twitter.com/lvuxDBQJwx</a></p>&mdash; Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1767371003527672237?ref_src=twsrc%5Etfw">March 12, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/2024-03-11-first-month-on-a-database-team.htmlMon, 11 Mar 2024 00:00:00 +0000An intuition for distributed consensus in OLTP systemshttp://notes.eatonphil.com/2024-02-08-an-intuition-for-distributed-consensus-in-oltp-systems.html<p><!-- -*- mode: markdown -*- --></p> <p>Distributed consensus in transactional databases (e.g. etcd or Cockroach) is a big deal these days. Most often under the hood are variations of log-based Paxos-like algorithms such as MultiPaxos, Viewstamped Replication, or Raft. While there are new variations that come out each year, optimizing for various workloads, these algorithms are fairly standard and well-understood.</p> <p>In fact they are used in so many places, Kubernetes for example, that even if you don't decide to implement Raft (which is fun and I encourage it), it seems worth building an intuition for distributed consensus.</p> <p>What happens as you tweak a configuration. What happens as the production environment changes. Or what to reach for as product requirements change.</p> <p>I've been <a href="https://notes.eatonphil.com/2023-05-25-raft.html">thinking</a> <a href="https://eatonphil.com/2023-ddia.html">about</a> the <a href="https://eatonphil.com/2023-database-internals.html">basics</a> of <a href="https://github.com/eatonphil/raft-rs">distributed consensus</a> recently. There has been a lot to digest and characterize. And I'm only beginning to get an understanding.</p> <p>This post is an attempt to share some of the intuition built up reading about and working in this space. Originally this post was also going to end with a walkthrough of my <a href="https://github.com/eatonphil/raft-rs">most recent</a> Raft implementation in Rust. But I'm going to hold off on that for another time.</p> <p>I was fortunate to have a few excellent reviewers looking at versions of this post: Paul Nowoczynski, Alex Miller, Jack Vanlightly, Daniel Chia, and Alex Petrov. Thank you!</p> <p>Let's start with Raft.</p> <h3 id="raft">Raft</h3><p>Raft is a distributed consensus algorithm that allows you to build a replicated state machine on top of a replicated log.</p> <p>A Raft library handles replicating and durably persisting a sequence (or <i>log</i>) of commands to at least a majority of nodes in a cluster. You provide the library a state machine that interprets the replicated commands. From the perspective of the Raft library, commands are just opaque byte strings.</p> <p>For example, you could build a replicated key-value store out of <code>SET</code> and <code>GET</code> commands that are passed in by a client. You provide a Raft library state machine code that interprets the Raft log of <code>SET</code> and <code>GET</code> commands to modify or read from an in-memory hashtable. You can find concrete examples of exactly this replicated key-value store modeling in <a href="https://notes.eatonphil.com/tags/raft.html">previous Raft posts</a> I've written.</p> <p>All nodes in the cluster run the same Raft code (including the state machine code you provide); communicating among themselves. Nodes elect a semi-permanent leader that accepts all reads and writes from clients. (Again, reads and writes are modeled as commands).</p> <p>To commit a new command to the cluster, clients send the command to all nodes in the cluster. Only the leader accepts this command, if there is currently a leader. Clients retry until there is a leader that accepts the command.</p> <p>The leader appends the command to its log and makes sure to replicate all commands in its log to followers in the same order. The leader sends periodic heartbeat messages to all followers to prolong its term as leader. If a follower hasn't heard from the leader within a period of time, it becomes a candidate and requests votes from the cluster.</p> <p>When a follower is asked to accept a new command from a leader, it checks if its history is up-to-date with the leader. If it is not, the follower rejects the request and asks the leader to send previous commands to bring it up-to-date. It does this ultimately, in the worst case of a follower that has lost all history, by going all the way back to the very first command ever sent.</p> <p>When a quorum (typically a majority) of nodes has accepted a command, the leader marks the command as committed and applies the command to its own state machine. When followers learn about newly committed commands, they also apply committed commands to their own state machine.</p> <p>For the most part, these details are graphically summarized in Figure 2 of the <a href="https://raft.github.io/raft.pdf">Raft paper</a>.</p> <h3 id="availability-and-linearizability">Availability and linearizability</h3><p>Taking a step back, distributed consensus helps a group of nodes, a cluster, agree on a value. A client of the cluster can treat a value from the cluster as if the value was atomically written to and read from a single thread. This property is called <a href="https://jepsen.io/consistency/models/linearizable">linearizability</a>.</p> <p>However, with distributed consensus, the client of the cluster has better availability guarantees from the cluster than if the client atomically wrote to or read from a single thread. A single thread that crashes becomes unavailable. But some number <code>f</code> nodes can crash in a cluster implementing distributed consensus and still 1) be available and 2) provide linearizable reads and writes.</p> <p>That is: <b>distributed consensus solves the problem of high availability for a system while remaining linearizable</b>.</p> <p>Without distributed consensus you can still achieve high availability. For example, a database might have two read replicas. But a client reading from a read replica might get stale data. Thus, this system (a database with two read replicas) is not linearizable.</p> <p>Without distributed consensus you can also try synchronous replication. It would be very simple to do. Have a fixed leader and require all nodes to acknowledge before committing, But the value here is extremely limited. If a single node in the cluster goes down the entire cluster is down.</p> <p>You might think I'm proposing a strawman. We could simply designate a permanent leader that handles all reads and writes; and require a majority of nodes to commit a command before the leader responds to a client. But in that case, what's the process for getting a lagging follower up-to-date? And what happens if it is the leader who goes down?</p> <p>Well, these are not trivial problems! And, beyond linearizability that we already mentioned, these problems are exactly what distributed consensus solves.</p> <h3 id="why-does-linearizability-matter?">Why does linearizability matter?</h3><p>It's very nice, and often even critical, to have a highly available system that will never give you stale data. And regardless, it's convenient to have a term for what we might naively think of as the "correct" way you'd always want to set and get a value.</p> <p>So linearizability is a convenient way of thinking about complex systems, if you can use or build a system that supports it. But it's not the only consistency approach you'll see in the wild.</p> <p>As you increase the guarantees of your consistency model, you tend to sacrifice performance. Going the opposite direction, some production systems sacrifice consistency to improve performance. For example, you might allow stale reads from any node, reading only from local state and avoiding consensus, so that you can reduce load on a leader and avoid the overhead of consensus.</p> <p>There are formal definitions for lower consistency models, including sequential and read-your-writes. You can read the <a href="https://jepsen.io/consistency">Jepsen page</a> for more detail.</p> <h3 id="best-and-worst-case-scenarios">Best and worst case scenarios</h3><p>A distributed system relies on communicating over the network. The worse the network, whether in terms of latency or reliability, the longer it will take for communication to happen.</p> <p>Aside from the network, disks can misdirect writes or corrupt data. Or you could be mounted on a network filesystem such as EBS.</p> <p>And processes themselves can crash due to low disk space or the OOM killer.</p> <p>It will take longer to achieve consensus to commit messages these scenarios. If messages take longer to reach nodes, or if nodes are constantly crashing, followers will timeout more often, triggering leader election. And the leader election itself (which also requires consensus) will also take longer.</p> <p>The best case scenario for distributed consensus is where the network is reliable and low-latency. Where disks are reliable and fast. And where processes don't often crash.</p> <p>TigerBeetle has an incredible <a href="https://sim.tigerbeetle.com/">visual simulator</a> that demonstrates what happens across ever-worsening environments. While TigerBeetle and this simulator use Viewstamped Replication, the demonstrated principles apply to Raft as well.</p> <h3 id="what-happens-when-you-add-nodes?">What happens when you add nodes?</h3><p>Distributed consensus algorithms make sure that some minimum number of nodes in a cluster agree before continuing. The minimum number is proportional to the total number of nodes in the cluster.</p> <p>A typical implementation of Raft for example will require 3 nodes in a 5-node cluster to agree before continuing. 4 nodes in a 7-node cluster. And so on.</p> <p>Recall that the p99 latency for a service is at least as bad as the slowest external request the service must make. As you increase the number of nodes you must talk to in a consensus cluster, you increase the chance of a slow request.</p> <p>Consider the extreme case of a 101-node cluster requiring 51 nodes to respond before returning to the client. That's 51 chances for a slower request. Compared to 4 chances in a 7-node cluster. The 101-node cluster is certainly more highly available though! It can tolerate 49 nodes going down. The 7-node cluster can only tolerate 3 nodes going down. The scenario where 49 nodes go down (assuming they're in different availability zones) seems pretty unlikely!</p> <h3 id="horizontal-scaling-with-distributed-consensus?-not-exactly">Horizontal scaling with distributed consensus? Not exactly</h3><p>All of this is to say that the most popular algorithms for distributed consensus, on their own, have nothing to do with horizontal scaling.</p> <p>The way that horizontally scaling databases like Cockroach or Yugabyte or Spanner work is by sharding the data, transparent to the client. Within each shard data is replicated with a dedicated distributed consensus cluster.</p> <p>So, yes, distributed consensus can be a <em>part</em> of horizontal scaling. But again what distributed consensus primarily solves is high availability via replication while remaining linearizable.</p> <p>This is not a trivial point to make. <a href="https://web.archive.org/web/20230327030543/https://etcd.io/docs/v3.2/learning/why/#using-etcd-for-metadata">etcd</a>, <a href="https://web.archive.org/web/20231212132325/https://www.hashicorp.com/resources/operating-and-running-consul-at-scale">consul</a>, and <a href="https://github.com/rqlite/rqlite">rqlite</a> are examples of databases that do not do sharding, only replication, via a single Raft cluster that replicates all data for the entire system.</p> <p>For these databases there is no horizontal scaling. If they support "horizontal scaling", they support this by doing non-linearizable (stale) reads. Writes remain a challenge.</p> <p>This doesn't mean these databases are bad. They are not. One obvious advantage they have over Cockroach or Spanner is that they are conceptually simpler. Conceptually simpler often equates to easier to operate. That's a big deal.</p> <h3 id="optimizations">Optimizations</h3><p>We've covered the basics of operation, but real-world implementations get more complex.</p> <h4 id="snapshots">Snapshots</h4><p>Rather than letting the log grow indefinitely, most libraries implement snapshotting. The user of the library provides a state machine and also provides a method for serializing the state machine to disk. The Raft library periodically serializes the state machine to disk and truncates the log.</p> <p>When a follower is so far behind that the leader no longer has a log entry (because it has been truncated), the leader transfers an entire snapshot to the follower. Then once the follower is caught up on snapshots, the leader can transfer normal log entries again.</p> <p>This technique is described in the Raft paper. While it isn't necessary for Raft to work, it's so important that it is hardly an optimization and more a required part of a production Raft system.</p> <h4 id="batching">Batching</h4><p>Rather than limiting clients of the cluster to submitting only one command at a time, it's common for the cluster to accept many commands at a time. Similarly, many commands at a time are submitted to followers. When any node needs to write commands to disk, it can batch commands to disk as well.</p> <p>But you can go a step beyond this in a way that is completely opaque to the Raft library. Each opaque command the client submits can <em>also</em> contain a batch of messages. In this scenario, only the user-provided state machine needs to be aware that each command it receives is actually a batch of messages that it should pull apart and interpret separately.</p> <p>This latter techinque is a fairly trivial way to increase throughput by an order of magnitude or two.</p> <h4 id="disk-and-network">Disk and network</h4><p>In terms of how data is stored on disk and how data is sent over the network there is obvious room for optimization.</p> <p>A naive implementation might store JSON on disk and send JSON over the network. A slightly more optimized implementation might store binary data on disk and send binary data over the network.</p> <p>Similarly you can swap out your RPC for gRPC or introduce zlib for compression to network or disk.</p> <p>You can swap out synchronous IO for libaio or io_uring or SPDK/DPDK.</p> <p>A little tweak I made in my latest Raft implementation was to index log entries so searching the log was not a linear operation. Another little tweak was to introduce a page cache to eliminate unnecessary disk reads. This increased throughput for by an order of magnitude.</p> <h4 id="flexible-quorums">Flexible quorums</h4><p>This brilliant <a href="https://arxiv.org/pdf/1608.06696.pdf">optimization</a> by Heidi Howard and co. shows you can relax the quorum required for committing new commands so long as you increase the quorum required for electing a leader.</p> <p>In an environment where leader election doesn't happen often, flexible quorums can increase throughput and decrease latency. And it's a pretty easy change to make!</p> <h4 id="more">More</h4><p>These are just a couple common optimizations. You can also read about <a href="https://www.pingcap.com/blog/optimizing-raft-in-tikv/">parallel state machine apply</a>, <a href="https://www.pingcap.com/blog/optimizing-raft-in-tikv/">parallel append to disk</a>, witnesses, <a href="https://vldb.org/pvldb/vol14/p2203-whittaker.pdf">compartmentalization</a>, and leader leases. TiKV, Scylla, RedPanda, and Cockroach tend to have public material talking about this stuff.</p> <p>There are also a few people I follow who are often reviewing relevant papers, if they are not producing their own. I encourage you to follow them too if this is interesting to you:</p> <ul> <li><a href="https://muratbuffalo.blogspot.com/">https://muratbuffalo.blogspot.com/</a></li> <li><a href="https://charap.co/">https://charap.co/</a></li> <li><a href="https://brooker.co.za/blog/">https://brooker.co.za/blog/</a></li> <li><a href="https://distributed-computing-musings.com/">https://distributed-computing-musings.com/</a></li> </ul> <h3 id="safety-and-testing">Safety and testing</h3><p>The other aspect to consider is safety. For example, checksums for everything written to disk and passed over the network; or <a href="https://www.usenix.org/conference/fast18/presentation/alagappan">being able to recover</a> from corruption in the log.</p> <p>Testing is also a big deal. There are prominent tools like <a href="https://jepsen.io/">Jepsen</a> that check for consistency in the face of fault injection (process failure, network failure, etc.). But even Jepsen has its limits. For example, it doesn't test disk failure.</p> <p>FoundationDB <a href="https://www.youtube.com/watch?v=4fFDFbi3toc">made popular</a> a number of testing techniques. And the people behind this testing went on to build a product, <a href="https://antithesis.com/">Antithesis</a>, around deterministic testing of non-deterministic code while injecting faults.</p> <p>And on that topic there's Facebook Experimental's <a href="https://github.com/facebookexperimental/hermit">Hermit</a> deterministic Linux hypervisor that may help to test complex distributed systems. However, my experience with it has not been great and the maintainers do not seem very engaged with other people who have reported bugs. I'm hopeful for it but we'll see.</p> <p>Antithesis and Hermit seem like a boon when half the trouble of working on distributed consensus implementations is avoiding flakey tests.</p> <p>Another promising avenue is emitting logs during the Raft lifecycle and validating the logs against a TLA+ spec. Microsoft has such a project that has <a href="https://github.com/etcd-io/raft/issues/111">begun to see adoption</a> among open-source Raft implementations.</p> <h3 id="conclusion">Conclusion</h3><p>Everything aside, consensus is expensive. There is overhead to the entire consensus process. So if you do not need this level of availability and can settle for some process via backups, it's going to have lower latency and higher throughput than if it had to go through distributed consensus.</p> <p>If you do need high availability, distributed consensus can be a great choice. But consider the environment and what you want from your consensus algorithm.</p> <p>Also, while MultiPaxos, Raft, and Viewstamped Replication are some of the most popular algorithms for distributed consensus, there is a world beyond. Two-phase commit, ZooKeeper Atomic Broadcast, PigPaxos, EPaxos, Accord by Cassandra. The world of distributed consensus also gets especially weird and interesting outside of OLTP systems.</p> <p>But that's enough for one post.</p> <h3 id="further-reading">Further reading</h3><ul> <li><a href="https://raft.github.io/raft.pdf">The Raft Paper</a></li> <li><a href="https://github.com/ongardie/raft.tla/blob/master/raft.tla">The Raft TLA+ Spec</a></li> <li><a href="https://web.stanford.edu/~ouster/cgi-bin/papers/OngaroPhD.pdf">The Raft Author's PhD Thesis on Raft</a></li> <li><a href="https://dataintensive.net/">Designing Data-Intensive Applications</a></li> <li><a href="https://dabeaz.com/raft.html">David Beazley's Raft Course</a> if you can get your company to pay for it</li> </ul> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I wrote a post about building an intuition for distributed consensus in OLTP systems!<br><br>Very grateful to all the folks who reviewed.<a href="https://t.co/wMxUuokKeg">https://t.co/wMxUuokKeg</a> <a href="https://t.co/cfY2kdfqak">pic.twitter.com/cfY2kdfqak</a></p>&mdash; Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1755580821476397527?ref_src=twsrc%5Etfw">February 8, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/2024-02-08-an-intuition-for-distributed-consensus-in-oltp-systems.htmlThu, 08 Feb 2024 00:00:00 +0000Writing a minimal in-memory storage engine for MySQL/MariaDBhttp://notes.eatonphil.com/2024-01-09-minimal-in-memory-storage-engine-for-mysql.html<p><!-- -*- mode: markdown -*- --></p> <p>I <a href="https://eatonphil.com/2024-01-wehack-mysql.html">spent a week</a> looking at MySQL/MariaDB internals along with ~80 other devs. Although MySQL and MariaDB are mostly the same (more on that later), I focused on MariaDB specifically this week.</p> <p>Before last week I had never built MySQL/MariaDB before. The first day of this hack week, I got MariaDB building locally and <a href="https://twitter.com/eatonphil/status/1742649922791395501">made a code tweak</a> so that <code>SELECT 23</code> returned <code>213</code>, and <a href="https://twitter.com/eatonphil/status/1742654868085526896">another tweak</a> so that <code>SELECT 80 + 20</code> returned <code>60</code>. The second day I got a <a href="https://twitter.com/eatonphil/status/1742958892957446490">basic UDF in C</a> working so that <code>SELECT mysum(20, 30)</code> returned <code>50</code>.</p> <p>The rest of the week I spent figuring out how to build a minimal in-memory storage engine, which I'll walk through in this post. 218 lines of C++.</p> <p>It supports <code>CREATE</code>, <code>DROP</code>, <code>INSERT</code>, and <code>SELECT</code> for tables that only have <code>INTEGER</code> fields. It is explicitly not thread-safe because I didn't have time to understand MariaDB's lock primitives.</p> <p>In this post I'll also talk about how the MariaDB custom storage API compares to the Postgres one, based on <a href="https://notes.eatonphil.com/2023-11-01-postgres-table-access-methods.html">a previous hack week project I did</a>.</p> <p>All code for this post can be found in <a href="https://github.com/eatonphil/mariadb/tree/11.4/storage/memem">my fork on GitHub</a>.</p> <h3 id="mysql-and-mariadb">MySQL and MariaDB</h3><p>Before we go further though, why do I keep saying MySQL/MariaDB?</p> <p>MySQL is GPL licensed (let's completely ignore the commercial variations of MySQL that Oracle offers). The code is open-source. However, the development is done behind closed doors. There is a code dump <a href="https://github.com/mysql/mysql-server/commits/trunk/">every month</a> or so.</p> <p>MariaDB is a fork of MySQL by the creator of MySQL (who is no longer involved, as it happens). It is also GPL licensed (let's completely ignore the commercial variations of MariaDB that MariaDB Corporation offers). The code is open-source. The development is also open-source.</p> <p>When you install "MySQL" in your Linux distro you are <a href="https://mariadb.com/newsroom/press-releases/mariadb-replaces-mysql-as-the-default-in-debian-9/">often actually</a> installing MariaDB.</p> <p>The two are mostly compatible. During this week, I <a href="https://twitter.com/eatonphil/status/1742642758408405237">stumbled onto</a> that they evolved support for <code>SELECT .. FROM VALUES ..</code> differently. Some differences are documented on <a href="https://mariadb.com/kb/en/moving-from-mysql/">the MariaDB KB</a>. But this KB is painful to browse. Which leads me to my next point.</p> <p>The <a href="https://dev.mysql.com/doc/">MySQL docs</a> are excellent. Easy to read, browse; and they are thorough. The <a href="https://mariadb.com/kb">MariaDB docs</a> are a work in progress. I'm sorry I can't be stoic: in just a week I've come to really hate using this KB. Thankfully, in some twisted way, it also doesn't seem to be very thorough either. It isn't completely avoidable though since there is no guarantee MySQL and MariaDB do the same thing.</p> <p>Ultimately, I spent the week using MariaDB because I'm biased toward fully open projects. But I kept having to look at MySQL docs, hoping they were relevant.</p> <p>Now that you understand the state of things, let's move on to fun stuff!</p> <h3 id="storage-engines">Storage engines</h3><p>Mature databases often support swapping out the storage layer. Maybe you want an in-memory storage layer so that you can quickly run integration tests. Maybe you want to switch between B-Trees (read-optimized) and LSM Trees (write-optimized) and unordered heaps (write-optimized) depending on your workload. Or maybe you just want to try a third-party storage library (e.g. <a href="https://rocksdb.org/">RocksDB</a> or <a href="https://sled.rs/">Sled</a> or <a href="https://tikv.org/">TiKV</a>).</p> <p>The benefit of swapping out only the storage engine is that, from a user's perspective, the semantics and features of the database stay mostly the same. But the database is magically faster for a workload.</p> <p>You keep powerful user management, extension support, SQL support, and a well-known wire protocol. You modify only the method of storing the actual data.</p> <h4 id="existing-storage-engines">Existing storage engines</h4><p>MySQL/MariaDB is particularly well known for its custom storage engine support. The MySQL docs for <a href="https://dev.mysql.com/doc/refman/8.0/en/storage-engines.html">alternate storage engines</a> are great.</p> <p>While the docs do warn that you should probably stick with the default storage engine, that warning didn't quite feel strong enough because nothing else seemed to indicate the state of other engines.</p> <p>Specifically, in the past I was always interested in the CSV storage engine. But when you look at the <a href="https://github.com/MariaDB/server/blob/11.4/storage/csv/ha_tina.cc">actual code for the CSV engine</a> there is a pretty strong warning:</p> <div class="highlight"><pre><span></span>First off, this is a play thing for me, there are a number of things wrong with it: *) It was designed for csv and therefore its performance is highly questionable. *) Indexes have not been implemented. This is because the files can be traded in and out of the table directory without having to worry about rebuilding anything. *) NULLs and &quot;&quot; are treated equally (like a spreadsheet). *) There was in the beginning no point to anyone seeing this other then me, so there is a good chance that I haven&#39;t quite documented it well. *) Less design, more &quot;make it work&quot; Now there are a few cool things with it: *) Errors can result in corrupted data files. *) Data files can be read by spreadsheets directly. TODO: *) Move to a block system for larger files *) Error recovery, its all there, just need to finish it *) Document how the chains work. -Brian </pre></div> <p>The difference between the seeming confidence of the docs and seeming confidence of the contributor made me chuckle.</p> <p>The benefit of these diverse storage engines for me was that they give examples of how to implement the storage engine API. The <a href="https://github.com/MariaDB/server/blob/11.4/storage/csv">csv</a>, <a href="https://github.com/MariaDB/server/tree/11.4/storage/blackhole">blackhole</a>, <a href="https://github.com/MariaDB/server/tree/11.4/storage/example">example</a>, and <a href="https://github.com/MariaDB/server/tree/11.4/storage/heap">heap</a> storage engines were particularly helpful to read.</p> <p>The heap engine is a complete in-memory storage engine. Complete means complex though. So there seemed to be room for a stripped down version of an in-memory engine.</p> <p>And that's we'll cover in this post! First though I want to talk a little bit about the limitations of custom storage engines.</p> <h3 id="limitations">Limitations</h3><p>While being able to tailor a storage engine to a workload is powerful, there are limits to the benefits based on the design of the storage API.</p> <p>Both Postgres and MySQL/MariaDB currently have a custom storage API built around <em>individual rows</em>.</p> <h4 id="column-wise-execution">Column-wise execution</h4><p>I have <a href="https://notes.eatonphil.com/2023-11-01-postgres-table-access-methods.html">previously written</a> that custom storage engines allows you to switch between column- and row-oriented data storage. Two big reasons to do column-wise storage are 1) opportunity for compression, and 2) fast operations on a single column.</p> <p>The opportunity for 1) compression <em>on disk</em> would still exist even if you needed to deal with individual rows at the storage API layer since the compression could happen on disk. However any benefits of passing around compressed columns <em>in memory</em> disappear if you must convert to rows for the storage API.</p> <p>You'd also lose the advantage for 2) fast operations on a single column if the column must be converted into a row at the storage API whereupon it's passed to higher levels that perform execution. The execution would happen row-wise, not column-wise.</p> <p>All of this is to say that while column-wise storage is possible, the <em>benefit of doing so</em> is not obvious with the current API design for both MySQL/MariaDB and Postgres.</p> <h4 id="vectorization">Vectorization</h4><p>An API built around individual rows also sets limits on the amount of vectorization you can do. A custom storage engine could still do some vectorization under the hood: always filling a buffer with N rows and returning a row from the buffer when the storage API requests a single row. But there is likely some degree of performance left on the table with an API that deals with individual rows.</p> <p>Remember though: if you did batched reads and writes of rows in the custom storage layer, there isn't necessarily any vectorization happening at the execution layer. From a <a href="https://notes.eatonphil.com/2023-09-21-how-do-databases-execute-expressions.html">previous study</a> I did, neither MySQL/MariaDB nor Postgres do vectorized query execution. This paragraph isn't a critique of the storage API, it's just something to keep in mind.</p> <h4 id="storage-versus-execution">Storage versus execution</h4><p>The general point I'm making here is that unless both the execution and storage APIs are designed in a certain way, you may attempt optimizations in the storage layer that are ineffective or even harmfull because the execution layer doesn't or can't take advantage of them.</p> <h4 id="nothing-permanent">Nothing permanent</h4><p>The current limitations of the storage API are not intrinsic aspects of MySQL/MariaDB or Postgres's design. For both project there used to be no pluggable storage at all. We can imagine a future patch to either project that allows support for batched row reads and writes that together could make column-wise storage and vectorized execution more feasible.</p> <p>Even today there have been invasive attempts to fully support <a href="https://www.citusdata.com/blog/2021/03/06/citus-10-columnar-compression-for-postgres/">column-wise storage and execution</a> in Postgres. And there have also been projects to bring <a href="https://github.com/citusdata/postgres_vectorization_test">vectorized execution to Postgres</a>.</p> <p>I'm not as familiar with the MySQL landscape to comment about efforts at the moment their.</p> <h3 id="debug-build-of-mariadb-running-locally">Debug build of MariaDB running locally</h3><p>Now that you've got some background, let's get a debug build of MariaDB!</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>git<span class="w"> </span>clone<span class="w"> </span>https://github.com/MariaDB/server<span class="w"> </span>mariadb <span class="gp">$ </span><span class="nb">cd</span><span class="w"> </span>mariadb <span class="gp">$ </span>mkdir<span class="w"> </span>build <span class="gp">$ </span><span class="nb">cd</span><span class="w"> </span>build <span class="gp">$ </span>cmake<span class="w"> </span>-DCMAKE_BUILD_TYPE<span class="o">=</span>Debug<span class="w"> </span>.. <span class="gp">$ </span>make<span class="w"> </span>-j8 </pre></div> <p>This takes a while. When I was hacking on Postgres (a C project), it took 1 minute on my beefy Linux server to build. It took 20-30 minutes to build MySQL/MariaDB from scratch. That's C++ for you!</p> <p>Thankfully incremental builds of MySQL/MariaDB for a tweak after the initial build take roughly the same time as incremental builds of Postgres after a tweak.</p> <p>Once the build is done, create a database.</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>./build/scripts/mariadb-install-db<span class="w"> </span>--srcdir<span class="o">=</span><span class="k">$(</span><span class="nb">pwd</span><span class="k">)</span><span class="w"> </span>--datadir<span class="o">=</span><span class="k">$(</span><span class="nb">pwd</span><span class="k">)</span>/db </pre></div> <p>And create a config for the database.</p> <div class="highlight"><pre><span></span><span class="gp">$ </span><span class="nb">echo</span><span class="w"> </span><span class="s2">&quot;[client]</span> <span class="go">socket=$(pwd)/mariadb.sock</span> <span class="go">[mariadb]</span> <span class="go">socket=$(pwd)/mariadb.sock</span> <span class="go">basedir=$(pwd)</span> <span class="go">datadir=$(pwd)/db</span> <span class="go">pid-file=$(pwd)/db.pid&quot; &gt; my.cnf</span> </pre></div> <p>Start up the server.</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>./build/sql/mariadbd<span class="w"> </span>--defaults-extra-file<span class="o">=</span><span class="k">$(</span><span class="nb">pwd</span><span class="k">)</span>/my.cnf<span class="w"> </span>--debug:d:o,<span class="k">$(</span><span class="nb">pwd</span><span class="k">)</span>/db.debug <span class="go">./build/sql/mariadbd: Can&#39;t create file &#39;/var/log/mariadb/mariadb.log&#39; (errno: 13 &quot;Permission denied&quot;)</span> <span class="go">2024-01-03 17:10:15 0 [Note] Starting MariaDB 11.4.0-MariaDB-debug source revision 3fad2b115569864d8c1b7ea90ce92aa895cfef08 as process 185550</span> <span class="go">2024-01-03 17:10:15 0 [Note] InnoDB: !!!!!!!! UNIV_DEBUG switched on !!!!!!!!!</span> <span class="go">2024-01-03 17:10:15 0 [Note] InnoDB: Compressed tables use zlib 1.2.13</span> <span class="go">2024-01-03 17:10:15 0 [Note] InnoDB: Number of transaction pools: 1</span> <span class="go">2024-01-03 17:10:15 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions</span> <span class="go">2024-01-03 17:10:15 0 [Note] InnoDB: Initializing buffer pool, total size = 128.000MiB, chunk size = 2.000MiB</span> <span class="go">2024-01-03 17:10:15 0 [Note] InnoDB: Completed initialization of buffer pool</span> <span class="go">2024-01-03 17:10:15 0 [Note] InnoDB: Buffered log writes (block size=512 bytes)</span> <span class="go">2024-01-03 17:10:15 0 [Note] InnoDB: End of log at LSN=57155</span> <span class="go">2024-01-03 17:10:15 0 [Note] InnoDB: Opened 3 undo tablespaces</span> <span class="go">2024-01-03 17:10:15 0 [Note] InnoDB: 128 rollback segments in 3 undo tablespaces are active.</span> <span class="go">2024-01-03 17:10:15 0 [Note] InnoDB: Setting file &#39;./ibtmp1&#39; size to 12.000MiB. Physically writing the file full; Please wait ...</span> <span class="go">2024-01-03 17:10:15 0 [Note] InnoDB: File &#39;./ibtmp1&#39; size is now 12.000MiB.</span> <span class="go">2024-01-03 17:10:15 0 [Note] InnoDB: log sequence number 57155; transaction id 16</span> <span class="go">2024-01-03 17:10:15 0 [Note] InnoDB: Loading buffer pool(s) from ./db/ib_buffer_pool</span> <span class="go">2024-01-03 17:10:15 0 [Note] Plugin &#39;FEEDBACK&#39; is disabled.</span> <span class="go">2024-01-03 17:10:15 0 [Note] Plugin &#39;wsrep-provider&#39; is disabled.</span> <span class="go">2024-01-03 17:10:15 0 [Note] InnoDB: Buffer pool(s) load completed at 240103 17:10:15</span> <span class="go">2024-01-03 17:10:15 0 [Note] Server socket created on IP: &#39;0.0.0.0&#39;.</span> <span class="go">2024-01-03 17:10:15 0 [Note] Server socket created on IP: &#39;::&#39;.</span> <span class="go">2024-01-03 17:10:15 0 [Note] mariadbd: Event Scheduler: Loaded 0 events</span> <span class="go">2024-01-03 17:10:15 0 [Note] ./build/sql/mariadbd: ready for connections.</span> <span class="go">Version: &#39;11.4.0-MariaDB-debug&#39; socket: &#39;./mariadb.sock&#39; port: 3306 Source distribution</span> </pre></div> <p class="note"> With that <code>--debug</code> flag, debug logs will show up in <code>$(pwd)/db.debug</code>. It's unclear why debug logs are treated separately from the console logs shown here. I'd rather them all be in one place. </p><p>In another terminal, run a client and make a request!</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>./build/client/mariadb<span class="w"> </span>--defaults-extra-file<span class="o">=</span><span class="k">$(</span><span class="nb">pwd</span><span class="k">)</span>/my.cnf<span class="w"> </span>--database<span class="o">=</span><span class="nb">test</span> <span class="go">Reading table information for completion of table and column names</span> <span class="go">You can turn off this feature to get a quicker startup with -A</span> <span class="go">Welcome to the MariaDB monitor. Commands end with ; or \g.</span> <span class="go">Your MariaDB connection id is 3</span> <span class="go">Server version: 11.4.0-MariaDB-debug Source distribution</span> <span class="go">Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.</span> <span class="go">Type &#39;help;&#39; or &#39;\h&#39; for help. Type &#39;\c&#39; to clear the current input statement.</span> <span class="go">MariaDB [test]&gt; SELECT 1;</span> <span class="go">+---+</span> <span class="go">| 1 |</span> <span class="go">+---+</span> <span class="go">| 1 |</span> <span class="go">+---+</span> <span class="go">1 row in set (0.001 sec)</span> </pre></div> <p>Huzzah! Let's write a custom storage engine!</p> <h3 id="where-does-the-code-go?">Where does the code go?</h3><p>When writing an extension for some project, I usually expect to have the extension exist in its own repo. I was able to do this with the <a href="https://github.com/eatonphil/pgtam">Postgres in-memory storage engine I wrote</a>. And in general, Postgres extensions exist as their own repos.</p> <p>I was able to create and build a UDF plugin outside the MariaDB source tree. But when it came to getting a storage engine to build and load successfully, I wasted almost an entire day (a large amount of time in a single hack week) getting nowhere.</p> <p>Extensions for MySQL/MariaDB are most easily built via the CMake infrastructure within the repo. Surely there's <em>some</em> way to replicate that infrastructure from outside the repo but I wasn't able to figure it out within a day and didn't want to spend more time on it.</p> <p>Apparently the <a href="https://twitter.com/kastauyra/status/1743346665442935174">normal thing to do</a> in MySQL/MariaDB is to keep extensions within a fork of MySQL/MariaDB.</p> <p>When I switched to this method I was able to very quickly get the storage engine building and loaded. So that's what we'll do.</p> <h3 id="boilerplate">Boilerplate</h3><p>Within the MariaDB source tree, create a new folder in the <code>storage</code> subdirectory.</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>mkdir<span class="w"> </span>storage/memem </pre></div> <p>Within <code>storage/memem/CMakeLists.txt</code> add the following.</p> <div class="highlight"><pre><span></span><span class="c"># Copyright (c) 2006, 2010, Oracle and/or its affiliates. All rights reserved.</span> <span class="c"># </span> <span class="c"># This program is free software; you can redistribute it and/or modify</span> <span class="c"># it under the terms of the GNU General Public License as published by</span> <span class="c"># the Free Software Foundation; version 2 of the License.</span> <span class="c"># </span> <span class="c"># This program is distributed in the hope that it will be useful,</span> <span class="c"># but WITHOUT ANY WARRANTY; without even the implied warranty of</span> <span class="c"># MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the</span> <span class="c"># GNU General Public License for more details.</span> <span class="c"># </span> <span class="c"># You should have received a copy of the GNU General Public License</span> <span class="c"># along with this program; if not, write to the Free Software</span> <span class="c"># Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1335 USA</span> <span class="nb">SET</span><span class="p">(</span><span class="s">MEMEM_SOURCES</span><span class="w"> </span><span class="s">ha_memem.cc</span><span class="w"> </span><span class="s">ha_memem.h</span><span class="p">)</span> <span class="nb">MYSQL_ADD_PLUGIN</span><span class="p">(</span><span class="s">memem</span><span class="w"> </span><span class="o">${</span><span class="nv">MEMEM_SOURCES</span><span class="o">}</span><span class="w"> </span><span class="s">STORAGE_ENGINE</span><span class="p">)</span> </pre></div> <p>This hooks into MySQL/MariaDB build infrastructure. So next time you run <code>make</code> within the <code>build</code> directory we created above, it will also build this project.</p> <h3 id="the-storage-engine-class">The storage engine class</h3><p>It would be nice to see a way to extend MySQL in C (for one, because it would then be easier to port to other languages). But all of the builtin storage methods use classes. So we'll do that too.</p> <p>The class we must implement is an instance of <a href="https://github.com/MariaDB/server/blob/11.4/sql/handler.h#L3200"><code>handler</code></a>. There is a single <code>handler</code> instance per thread, corresponding to a single running query. (Postgres gives each query its own process, MySQL gives each query its own thread.) However, <code>handler</code> instances are reused across different queries.</p> <p>There are a number of virtual methods on <code>handler</code> we must implement in our subclass. For most of them we'll do nothing: simply returning immediately. These simple methods will be implemented in <code>ha_memem.h</code>. The methods with more complex logic will be implemented in <code>ha_memem.cc</code>.</p> <p>Let's set up includes in <code>ha_memem.h</code>.</p> <div class="highlight"><pre><span></span><span class="cm">/* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.</span> <span class="cm"> This program is free software; you can redistribute it and/or modify</span> <span class="cm"> it under the terms of the GNU General Public License as published by</span> <span class="cm"> the Free Software Foundation; version 2 of the License.</span> <span class="cm"> This program is distributed in the hope that it will be useful,</span> <span class="cm"> but WITHOUT ANY WARRANTY; without even the implied warranty of</span> <span class="cm"> MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the</span> <span class="cm"> GNU General Public License for more details.</span> <span class="cm"> You should have received a copy of the GNU General Public License</span> <span class="cm"> along with this program; if not, write to the Free Software</span> <span class="cm"> Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1335 USA */</span> <span class="cp">#ifdef USE_PRAGMA_INTERFACE</span> <span class="cp">#pragma interface </span><span class="cm">/* gcc class implementation */</span> <span class="cp">#endif</span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&quot;thr_lock.h&quot;</span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&quot;handler.h&quot;</span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&quot;table.h&quot;</span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&quot;sql_const.h&quot;</span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;vector&gt;</span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;memory&gt;</span> </pre></div> <p>Next we'll define structs for our in-memory storage.</p> <div class="highlight"><pre><span></span><span class="k">typedef</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o">&lt;</span><span class="n">uchar</span><span class="o">&gt;</span><span class="w"> </span><span class="n">MememRow</span><span class="p">;</span> <span class="k">struct</span><span class="w"> </span><span class="nc">MememTable</span> <span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o">&lt;</span><span class="n">std</span><span class="o">::</span><span class="n">shared_ptr</span><span class="o">&lt;</span><span class="n">MememRow</span><span class="o">&gt;&gt;</span><span class="w"> </span><span class="n">rows</span><span class="p">;</span> <span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">shared_ptr</span><span class="o">&lt;</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="o">&gt;</span><span class="w"> </span><span class="n">name</span><span class="p">;</span> <span class="p">};</span> <span class="k">struct</span><span class="w"> </span><span class="nc">MememDatabase</span> <span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o">&lt;</span><span class="n">std</span><span class="o">::</span><span class="n">shared_ptr</span><span class="o">&lt;</span><span class="n">MememTable</span><span class="o">&gt;&gt;</span><span class="w"> </span><span class="n">tables</span><span class="p">;</span> <span class="p">};</span> </pre></div> <p>Within <code>ha_memem.cc</code> we'll implement a global (not thread-safe) <code>static MememDatabase*</code> that all <code>handler</code> instances will query when requested. We need the definitions in the header file though because we'll store the table currently being queried in the <code>handler</code> subclass.</p> <p>This is so that every call to <code>write_row</code> to write a single row or call to <code>rnd_next</code> to read a single row does not need to look up the in-memory table object N times within the same query.</p> <p>And finally we'll define the subclass of <code>handler</code> and implementations of trivial methods.</p> <div class="highlight"><pre><span></span><span class="k">class</span><span class="w"> </span><span class="nc">ha_memem</span><span class="w"> </span><span class="k">final</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="k">public</span><span class="w"> </span><span class="n">handler</span> <span class="p">{</span> <span class="w"> </span><span class="n">uint</span><span class="w"> </span><span class="n">current_position</span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">shared_ptr</span><span class="o">&lt;</span><span class="n">MememTable</span><span class="o">&gt;</span><span class="w"> </span><span class="n">memem_table</span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="k">public</span><span class="o">:</span> <span class="w"> </span><span class="n">ha_memem</span><span class="p">(</span><span class="n">handlerton</span><span class="w"> </span><span class="o">*</span><span class="n">hton</span><span class="p">,</span><span class="w"> </span><span class="n">TABLE_SHARE</span><span class="w"> </span><span class="o">*</span><span class="n">table_arg</span><span class="p">)</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="n">handler</span><span class="p">(</span><span class="n">hton</span><span class="p">,</span><span class="w"> </span><span class="n">table_arg</span><span class="p">)</span> <span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="o">~</span><span class="n">ha_memem</span><span class="p">()</span><span class="o">=</span><span class="w"> </span><span class="k">default</span><span class="p">;</span> <span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="nf">index_type</span><span class="p">(</span><span class="n">uint</span><span class="w"> </span><span class="n">key_number</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">;</span><span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">ulonglong</span><span class="w"> </span><span class="nf">table_flags</span><span class="p">()</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">ulong</span><span class="w"> </span><span class="nf">index_flags</span><span class="p">(</span><span class="n">uint</span><span class="w"> </span><span class="n">inx</span><span class="p">,</span><span class="w"> </span><span class="n">uint</span><span class="w"> </span><span class="n">part</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">all_parts</span><span class="p">)</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="cm">/* The following defines can be increased if necessary */</span> <span class="cp">#define MEMEM_MAX_KEY MAX_KEY </span><span class="cm">/* Max allowed keys */</span> <span class="cp">#define MEMEM_MAX_KEY_SEG 16 </span><span class="cm">/* Max segments for key */</span> <span class="cp">#define MEMEM_MAX_KEY_LENGTH 3500 </span><span class="cm">/* Like in InnoDB */</span> <span class="w"> </span><span class="n">uint</span><span class="w"> </span><span class="nf">max_supported_keys</span><span class="p">()</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">MEMEM_MAX_KEY</span><span class="p">;</span><span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">uint</span><span class="w"> </span><span class="nf">max_supported_key_length</span><span class="p">()</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">MEMEM_MAX_KEY_LENGTH</span><span class="p">;</span><span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">uint</span><span class="w"> </span><span class="nf">max_supported_key_part_length</span><span class="p">()</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">MEMEM_MAX_KEY_LENGTH</span><span class="p">;</span><span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">open</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">mode</span><span class="p">,</span><span class="w"> </span><span class="n">uint</span><span class="w"> </span><span class="n">test_if_locked</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">close</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">truncate</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">rnd_init</span><span class="p">(</span><span class="kt">bool</span><span class="w"> </span><span class="n">scan</span><span class="p">);</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">rnd_next</span><span class="p">(</span><span class="n">uchar</span><span class="w"> </span><span class="o">*</span><span class="n">buf</span><span class="p">);</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">rnd_pos</span><span class="p">(</span><span class="n">uchar</span><span class="w"> </span><span class="o">*</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">uchar</span><span class="w"> </span><span class="o">*</span><span class="n">pos</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">index_read_map</span><span class="p">(</span><span class="n">uchar</span><span class="w"> </span><span class="o">*</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">uchar</span><span class="w"> </span><span class="o">*</span><span class="n">key</span><span class="p">,</span><span class="w"> </span><span class="n">key_part_map</span><span class="w"> </span><span class="n">keypart_map</span><span class="p">,</span> <span class="w"> </span><span class="k">enum</span><span class="w"> </span><span class="nc">ha_rkey_function</span><span class="w"> </span><span class="n">find_flag</span><span class="p">)</span> <span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">HA_ERR_END_OF_FILE</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">index_read_idx_map</span><span class="p">(</span><span class="n">uchar</span><span class="w"> </span><span class="o">*</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">uint</span><span class="w"> </span><span class="n">idx</span><span class="p">,</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">uchar</span><span class="w"> </span><span class="o">*</span><span class="n">key</span><span class="p">,</span> <span class="w"> </span><span class="n">key_part_map</span><span class="w"> </span><span class="n">keypart_map</span><span class="p">,</span> <span class="w"> </span><span class="k">enum</span><span class="w"> </span><span class="nc">ha_rkey_function</span><span class="w"> </span><span class="n">find_flag</span><span class="p">)</span> <span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">HA_ERR_END_OF_FILE</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">index_read_last_map</span><span class="p">(</span><span class="n">uchar</span><span class="w"> </span><span class="o">*</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">uchar</span><span class="w"> </span><span class="o">*</span><span class="n">key</span><span class="p">,</span> <span class="w"> </span><span class="n">key_part_map</span><span class="w"> </span><span class="n">keypart_map</span><span class="p">)</span> <span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">HA_ERR_END_OF_FILE</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">index_next</span><span class="p">(</span><span class="n">uchar</span><span class="w"> </span><span class="o">*</span><span class="n">buf</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">HA_ERR_END_OF_FILE</span><span class="p">;</span><span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">index_prev</span><span class="p">(</span><span class="n">uchar</span><span class="w"> </span><span class="o">*</span><span class="n">buf</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">HA_ERR_END_OF_FILE</span><span class="p">;</span><span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">index_first</span><span class="p">(</span><span class="n">uchar</span><span class="w"> </span><span class="o">*</span><span class="n">buf</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">HA_ERR_END_OF_FILE</span><span class="p">;</span><span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">index_last</span><span class="p">(</span><span class="n">uchar</span><span class="w"> </span><span class="o">*</span><span class="n">buf</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">HA_ERR_END_OF_FILE</span><span class="p">;</span><span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">position</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="n">uchar</span><span class="w"> </span><span class="o">*</span><span class="n">record</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="p">;</span><span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">info</span><span class="p">(</span><span class="n">uint</span><span class="w"> </span><span class="n">flag</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">external_lock</span><span class="p">(</span><span class="n">THD</span><span class="w"> </span><span class="o">*</span><span class="n">thd</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">lock_type</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">create</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="n">TABLE</span><span class="w"> </span><span class="o">*</span><span class="n">table_arg</span><span class="p">,</span><span class="w"> </span><span class="n">HA_CREATE_INFO</span><span class="w"> </span><span class="o">*</span><span class="n">create_info</span><span class="p">);</span> <span class="w"> </span><span class="n">THR_LOCK_DATA</span><span class="w"> </span><span class="o">**</span><span class="nf">store_lock</span><span class="p">(</span><span class="n">THD</span><span class="w"> </span><span class="o">*</span><span class="n">thd</span><span class="p">,</span><span class="w"> </span><span class="n">THR_LOCK_DATA</span><span class="w"> </span><span class="o">**</span><span class="n">to</span><span class="p">,</span> <span class="w"> </span><span class="k">enum</span><span class="w"> </span><span class="nc">thr_lock_type</span><span class="w"> </span><span class="n">lock_type</span><span class="p">)</span> <span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">to</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">delete_table</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">name</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="p">}</span> <span class="k">private</span><span class="o">:</span> <span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="n">reset_memem_table</span><span class="p">();</span> <span class="w"> </span><span class="k">virtual</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">write_row</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="n">uchar</span><span class="w"> </span><span class="o">*</span><span class="n">buf</span><span class="p">);</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">update_row</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="n">uchar</span><span class="w"> </span><span class="o">*</span><span class="n">old_data</span><span class="p">,</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">uchar</span><span class="w"> </span><span class="o">*</span><span class="n">new_data</span><span class="p">)</span> <span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">HA_ERR_WRONG_COMMAND</span><span class="p">;</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">delete_row</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="n">uchar</span><span class="w"> </span><span class="o">*</span><span class="n">buf</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">HA_ERR_WRONG_COMMAND</span><span class="p">;</span><span class="w"> </span><span class="p">}</span> <span class="p">};</span> </pre></div> <p>A complete storage engine might seriously implement all of these methods. But we'll only seriously implement 7 of them.</p> <p>To finish up the boilerplate, we'll switch over to <code>ha_memem.cc</code> and set up the includes.</p> <div class="highlight"><pre><span></span><span class="cm">/* Copyright (c) 2005, 2012, Oracle and/or its affiliates. All rights reserved.</span> <span class="cm"> This program is free software; you can redistribute it and/or modify</span> <span class="cm"> it under the terms of the GNU General Public License as published by</span> <span class="cm"> the Free Software Foundation; version 2 of the License.</span> <span class="cm"> This program is distributed in the hope that it will be useful,</span> <span class="cm"> but WITHOUT ANY WARRANTY; without even the implied warranty of</span> <span class="cm"> MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the</span> <span class="cm"> GNU General Public License for more details.</span> <span class="cm"> You should have received a copy of the GNU General Public License</span> <span class="cm"> along with this program; if not, write to the Free Software</span> <span class="cm"> Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1335 USA */</span> <span class="cp">#ifdef USE_PRAGMA_IMPLEMENTATION</span> <span class="cp">#pragma implementation </span><span class="c1">// gcc: Class implementation</span> <span class="cp">#endif</span> <span class="cp">#define MYSQL_SERVER 1</span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;my_global.h&gt;</span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&quot;sql_priv.h&quot;</span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&quot;unireg.h&quot;</span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&quot;sql_class.h&quot;</span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&quot;ha_memem.h&quot;</span> </pre></div> <p>Ok! Let's dig into the implementation.</p> <h3 id="implementation">Implementation</h3><h4 id="the-global-database">The global database</h4><p>First up, we need to declare a global <code>MememDatabase*</code> instance. We'll also implement a helper function for finding the index of a table by name within the database.</p> <div class="highlight"><pre><span></span><span class="c1">// WARNING! All accesses of `database` in this code are thread</span> <span class="c1">// unsafe. Since this was written during a hack week, I didn&#39;t have</span> <span class="c1">// time to figure out MySQL/MariaDB&#39;s runtime well enough to do the</span> <span class="c1">// thread-safe version of this.</span> <span class="k">static</span><span class="w"> </span><span class="n">MememDatabase</span><span class="w"> </span><span class="o">*</span><span class="n">database</span><span class="p">;</span> <span class="k">static</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">memem_table_index</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">name</span><span class="p">)</span> <span class="p">{</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">i</span><span class="p">;</span> <span class="w"> </span><span class="n">assert</span><span class="p">(</span><span class="n">database</span><span class="o">-&gt;</span><span class="n">tables</span><span class="p">.</span><span class="n">size</span><span class="p">()</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">INT_MAX</span><span class="p">);</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="p">(</span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="n">database</span><span class="o">-&gt;</span><span class="n">tables</span><span class="p">.</span><span class="n">size</span><span class="p">();</span><span class="w"> </span><span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">strcmp</span><span class="p">(</span><span class="n">database</span><span class="o">-&gt;</span><span class="n">tables</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">-&gt;</span><span class="n">name</span><span class="o">-&gt;</span><span class="n">c_str</span><span class="p">(),</span><span class="w"> </span><span class="n">name</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span> <span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">i</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">-1</span><span class="p">;</span> <span class="p">}</span> </pre></div> <p class="note"> As I wrote this post I noticed that this code also assumes there's only a single database. That isn't how MySQL works. Everytime you call <code>USE ...</code> in MySQL you are switching between databases. You can query tables across databases. A real in-memory backend would need to be aware of the different databases, not just different tables. But to keep the code succinct we won't implement that in this post. </p><p>Next we'll implement plugin initialization and cleanup.</p> <h4 id="plugin-lifecycle">Plugin lifecycle</h4><p>Before we register the plugin with MariaDB, we need to set up initialization and cleanup methods for it.</p> <p>The initialization method will take care of initializing the global <code>MememDatabase* database</code> object. It will set up a handler for creating new instances of our <code>handler</code> subclass. And it will set up a handler for deleting tables.</p> <div class="highlight"><pre><span></span><span class="k">static</span><span class="w"> </span><span class="n">handler</span><span class="w"> </span><span class="o">*</span><span class="nf">memem_create_handler</span><span class="p">(</span><span class="n">handlerton</span><span class="w"> </span><span class="o">*</span><span class="n">hton</span><span class="p">,</span><span class="w"> </span><span class="n">TABLE_SHARE</span><span class="w"> </span><span class="o">*</span><span class="n">table</span><span class="p">,</span> <span class="w"> </span><span class="n">MEM_ROOT</span><span class="w"> </span><span class="o">*</span><span class="n">mem_root</span><span class="p">)</span> <span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="p">(</span><span class="n">mem_root</span><span class="p">)</span><span class="w"> </span><span class="n">ha_memem</span><span class="p">(</span><span class="n">hton</span><span class="p">,</span><span class="w"> </span><span class="n">table</span><span class="p">);</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">memem_init</span><span class="p">(</span><span class="kt">void</span><span class="w"> </span><span class="o">*</span><span class="n">p</span><span class="p">)</span> <span class="p">{</span> <span class="w"> </span><span class="n">handlerton</span><span class="w"> </span><span class="o">*</span><span class="n">memem_hton</span><span class="p">;</span> <span class="w"> </span><span class="n">memem_hton</span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">handlerton</span><span class="w"> </span><span class="o">*</span><span class="p">)</span><span class="w"> </span><span class="n">p</span><span class="p">;</span> <span class="w"> </span><span class="n">memem_hton</span><span class="o">-&gt;</span><span class="n">db_type</span><span class="o">=</span><span class="w"> </span><span class="n">DB_TYPE_AUTOASSIGN</span><span class="p">;</span> <span class="w"> </span><span class="n">memem_hton</span><span class="o">-&gt;</span><span class="n">create</span><span class="o">=</span><span class="w"> </span><span class="n">memem_create_handler</span><span class="p">;</span> <span class="w"> </span><span class="n">memem_hton</span><span class="o">-&gt;</span><span class="n">drop_table</span><span class="o">=</span><span class="w"> </span><span class="p">[](</span><span class="n">handlerton</span><span class="w"> </span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">name</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">index</span><span class="o">=</span><span class="w"> </span><span class="n">memem_table_index</span><span class="p">(</span><span class="n">name</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">index</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">-1</span><span class="p">)</span> <span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">HA_ERR_NO_SUCH_TABLE</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">database</span><span class="o">-&gt;</span><span class="n">tables</span><span class="p">.</span><span class="n">erase</span><span class="p">(</span><span class="n">database</span><span class="o">-&gt;</span><span class="n">tables</span><span class="p">.</span><span class="n">begin</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">index</span><span class="p">);</span> <span class="w"> </span><span class="n">DBUG_PRINT</span><span class="p">(</span><span class="s">&quot;info&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="s">&quot;[MEMEM] Deleted table &#39;%s&#39;.&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="p">));</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="n">memem_hton</span><span class="o">-&gt;</span><span class="n">flags</span><span class="o">=</span><span class="w"> </span><span class="n">HTON_CAN_RECREATE</span><span class="p">;</span> <span class="w"> </span><span class="c1">// Initialize global in-memory database.</span> <span class="w"> </span><span class="n">database</span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">MememDatabase</span><span class="p">;</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="p">}</span> </pre></div> <p class="note"> The <code>DBUG_PRINT</code> macro is a debug helper MySQL/MariaDB gives us. As noted above, the output is directed to a file specified by the <code>--debug</code> flag. Unfortunately I couldn't figure out how to flush the stream this macro writes to. It seemed like occasionally when there was a segfault logs I expected to be there weren't there. And the file would often contain what looked like partially written logs. Anyway, as long as there wasn't a segfault the debug file will eventually contain the <code>DBUG_PRINT</code> logs. </p><p>The only thing the plugin cleanup function must do is delete the global database.</p> <div class="highlight"><pre><span></span><span class="k">static</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="nf">memem_fini</span><span class="p">(</span><span class="kt">void</span><span class="w"> </span><span class="o">*</span><span class="n">p</span><span class="p">)</span> <span class="p">{</span> <span class="w"> </span><span class="k">delete</span><span class="w"> </span><span class="n">database</span><span class="p">;</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="p">}</span> </pre></div> <p>Now we can register the plugin!</p> <h4 id="plugin-registration">Plugin registration</h4><p>The <code>maria_declare_plugin</code> and <code>maria_declare_plugin_end</code> register the plugin's metadata (name, version, etc.) and callbacks.</p> <div class="highlight"><pre><span></span><span class="k">struct</span><span class="w"> </span><span class="nc">st_mysql_storage_engine</span><span class="w"> </span><span class="n">memem_storage_engine</span><span class="o">=</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">MYSQL_HANDLERTON_INTERFACE_VERSION</span><span class="p">};</span> <span class="n">maria_declare_plugin</span><span class="p">(</span><span class="n">memem</span><span class="p">){</span> <span class="w"> </span><span class="n">MYSQL_STORAGE_ENGINE_PLUGIN</span><span class="p">,</span> <span class="w"> </span><span class="o">&amp;</span><span class="n">memem_storage_engine</span><span class="p">,</span> <span class="w"> </span><span class="s">&quot;MEMEM&quot;</span><span class="p">,</span> <span class="w"> </span><span class="s">&quot;MySQL AB&quot;</span><span class="p">,</span> <span class="w"> </span><span class="s">&quot;In-memory database.&quot;</span><span class="p">,</span> <span class="w"> </span><span class="n">PLUGIN_LICENSE_GPL</span><span class="p">,</span> <span class="w"> </span><span class="n">memem_init</span><span class="p">,</span><span class="w"> </span><span class="cm">/* Plugin Init */</span> <span class="w"> </span><span class="n">memem_fini</span><span class="p">,</span><span class="w"> </span><span class="cm">/* Plugin Deinit */</span> <span class="w"> </span><span class="mh">0x0100</span><span class="w"> </span><span class="cm">/* 1.0 */</span><span class="p">,</span> <span class="w"> </span><span class="nb">NULL</span><span class="p">,</span><span class="w"> </span><span class="cm">/* status variables */</span> <span class="w"> </span><span class="nb">NULL</span><span class="p">,</span><span class="w"> </span><span class="cm">/* system variables */</span> <span class="w"> </span><span class="s">&quot;1.0&quot;</span><span class="p">,</span><span class="w"> </span><span class="cm">/* string version */</span> <span class="w"> </span><span class="n">MariaDB_PLUGIN_MATURITY_STABLE</span><span class="w"> </span><span class="cm">/* maturity */</span> <span class="p">}</span><span class="w"> </span><span class="n">maria_declare_plugin_end</span><span class="p">;</span> </pre></div> <p>That's it! Now we need to implement methods for writing rows, reading rows, and creating a new table.</p> <h4 id="create-table">Create table</h4><p>To create a table, we make sure one by this name doesn't already exist, make sure it only has <code>INTEGER</code> fields, allocate memory for the table, and append it to the global database.</p> <div class="highlight"><pre><span></span><span class="kt">int</span><span class="w"> </span><span class="nf">ha_memem::create</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="n">TABLE</span><span class="w"> </span><span class="o">*</span><span class="n">table_arg</span><span class="p">,</span> <span class="w"> </span><span class="n">HA_CREATE_INFO</span><span class="w"> </span><span class="o">*</span><span class="n">create_info</span><span class="p">)</span> <span class="p">{</span> <span class="w"> </span><span class="n">assert</span><span class="p">(</span><span class="n">memem_table_index</span><span class="p">(</span><span class="n">name</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">-1</span><span class="p">);</span> <span class="w"> </span><span class="c1">// We only support INTEGER fields for now.</span> <span class="w"> </span><span class="n">uint</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">table_arg</span><span class="o">-&gt;</span><span class="n">field</span><span class="p">[</span><span class="n">i</span><span class="p">])</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">table_arg</span><span class="o">-&gt;</span><span class="n">field</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="o">-&gt;</span><span class="n">type</span><span class="p">()</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">MYSQL_TYPE_LONG</span><span class="p">)</span> <span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">DBUG_PRINT</span><span class="p">(</span><span class="s">&quot;info&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="s">&quot;Unsupported field type.&quot;</span><span class="p">));</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">i</span><span class="o">++</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">t</span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">make_shared</span><span class="o">&lt;</span><span class="n">MememTable</span><span class="o">&gt;</span><span class="p">();</span> <span class="w"> </span><span class="n">t</span><span class="o">-&gt;</span><span class="n">name</span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">make_shared</span><span class="o">&lt;</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="o">&gt;</span><span class="p">(</span><span class="n">name</span><span class="p">);</span> <span class="w"> </span><span class="n">database</span><span class="o">-&gt;</span><span class="n">tables</span><span class="p">.</span><span class="n">push_back</span><span class="p">(</span><span class="n">t</span><span class="p">);</span> <span class="w"> </span><span class="n">DBUG_PRINT</span><span class="p">(</span><span class="s">&quot;info&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="s">&quot;[MEMEM] Created table &#39;%s&#39;.&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="p">));</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="p">}</span> </pre></div> <p>Not very complicated. Let's handle <code>INSERT</code>-ing rows next.</p> <h4 id="insert-row">Insert row</h4><p>There is no method called when an <code>INSERT</code> starts. There is a <code>table</code> field on the <code>handler</code> parent class that is updated though when a <code>SELECT</code> or <code>INSERT</code> is going. So we can fetch the current table from that field.</p> <p>Since we have a slot for a <code>std::shared_ptr&lt;MememTable&gt; memem_table</code> on the <code>ha_memem</code> class, we can check if it is <code>NULL</code> when we insert a row. If it is, we look up the current table and set <code>this-&gt;memem_table</code> to its <code>MememTable</code>.</p> <p>But there's a bit more to it than just the table name. The <code>const char* name</code> passed to the <code>create()</code> method above seems to be a sort of fully qualified name for the table. By observation, when creating a table <code>y</code> in a database <code>test</code>, the <code>const char* name</code> value is <code>./test/y</code>. The <code>.</code> prefix probably means that the database is local, but I'm not sure.</p> <p>So we'll write a helper method that will reconstruct the fully qualified table name before looking up that fully qualified table name in the global database.</p> <div class="highlight"><pre><span></span><span class="kt">void</span><span class="w"> </span><span class="nf">ha_memem::reset_memem_table</span><span class="p">()</span> <span class="p">{</span> <span class="w"> </span><span class="c1">// Reset table cursor.</span> <span class="w"> </span><span class="n">current_position</span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">full_name</span><span class="o">=</span><span class="w"> </span><span class="s">&quot;./&quot;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="p">(</span><span class="n">table</span><span class="o">-&gt;</span><span class="n">s</span><span class="o">-&gt;</span><span class="n">db</span><span class="p">.</span><span class="n">str</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s">&quot;/&quot;</span><span class="w"> </span><span class="o">+</span> <span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="p">(</span><span class="n">table</span><span class="o">-&gt;</span><span class="n">s</span><span class="o">-&gt;</span><span class="n">table_name</span><span class="p">.</span><span class="n">str</span><span class="p">);</span> <span class="w"> </span><span class="n">DBUG_PRINT</span><span class="p">(</span><span class="s">&quot;info&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="s">&quot;[MEMEM] Resetting to &#39;%s&#39;.&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">full_name</span><span class="p">.</span><span class="n">c_str</span><span class="p">()));</span> <span class="w"> </span><span class="n">assert</span><span class="p">(</span><span class="n">database</span><span class="o">-&gt;</span><span class="n">tables</span><span class="p">.</span><span class="n">size</span><span class="p">()</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">index</span><span class="o">=</span><span class="w"> </span><span class="n">memem_table_index</span><span class="p">(</span><span class="n">full_name</span><span class="p">.</span><span class="n">c_str</span><span class="p">());</span> <span class="w"> </span><span class="n">assert</span><span class="p">(</span><span class="n">index</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span> <span class="w"> </span><span class="n">assert</span><span class="p">(</span><span class="n">index</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="p">(</span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="n">database</span><span class="o">-&gt;</span><span class="n">tables</span><span class="p">.</span><span class="n">size</span><span class="p">());</span> <span class="w"> </span><span class="n">memem_table</span><span class="o">=</span><span class="w"> </span><span class="n">database</span><span class="o">-&gt;</span><span class="n">tables</span><span class="p">[</span><span class="n">index</span><span class="p">];</span> <span class="p">}</span> </pre></div> <p>Then we can use this within <code>write_row</code> to figure out the current <code>MememTable</code> being queried.</p> <p>But first, let's digress into how MySQL stores rows.</p> <h4 id="the-mysql-row-api">The MySQL row API</h4><p>When you <a href="https://notes.eatonphil.com/2023-11-01-postgres-table-access-methods.html">write a Postgres custom storage API</a>, you are expected to basically read from or write to an array of <code>Datum</code>.</p> <p>Totally sensible.</p> <p>In MySQL, you read from and write to an array of bytes. That's pretty weird to me. Of course you can build your own higher level serialization/deserialization on top of it. But it's just strange to me everyone has to know this basically opaque API.</p> <p>Certainly <a href="https://github.com/MariaDB/server/blob/11.4/sql/handler.h#L3152">it's documented</a>.</p> <div class="highlight"><pre><span></span>The handler class is the interface for dynamically loadable storage engines. Do not add ifdefs and take care when adding or changing virtual functions to avoid vtable confusion Functions in this class accept and return table columns data. Two data representation formats are used: 1. TableRecordFormat - Used to pass [partial] table records to/from storage engine 2. KeyTupleFormat - used to pass index search tuples (aka &quot;keys&quot;) to storage engine. See opt_range.cc for description of this format. TableRecordFormat ================= [Warning: this description is work in progress and may be incomplete] The table record is stored in a fixed-size buffer: record: null_bytes, column1_data, column2_data, ... The offsets of the parts of the buffer are also fixed: every column has an offset to its column{i}_data, and if it is nullable it also has its own bit in null_bytes. </pre></div> <p>In our implementation, we'll skip the support for <code>NULL</code> values. We'll only support <code>INTEGER</code> fields. But we still need to be aware that the first byte will be taken up. We'll also assume there won't be more than one byte of a NULL bitmap.</p> <p>It is this opaque byte array that we'll read from in <code>write_row(const uchar* buf)</code> and write to in <code>read_row(uchar* buf)</code>.</p> <h4 id="insert-row-(take-two)">Insert row (take two)</h4><p>To keep things simple we're going to store the row in <code>MememTable</code> the same way MySQL passes it around.</p> <div class="highlight"><pre><span></span><span class="kt">int</span><span class="w"> </span><span class="nf">ha_memem::write_row</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="n">uchar</span><span class="w"> </span><span class="o">*</span><span class="n">buf</span><span class="p">)</span> <span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">memem_table</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">NULL</span><span class="p">)</span> <span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">reset_memem_table</span><span class="p">();</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Assume there are no NULLs.</span> <span class="w"> </span><span class="n">buf</span><span class="o">++</span><span class="p">;</span> <span class="w"> </span><span class="n">uint</span><span class="w"> </span><span class="n">field_count</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">table</span><span class="o">-&gt;</span><span class="n">field</span><span class="p">[</span><span class="n">field_count</span><span class="p">])</span><span class="w"> </span><span class="n">field_count</span><span class="o">++</span><span class="p">;</span> <span class="w"> </span><span class="c1">// Store the row in the same format MariaDB gives us.</span> <span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">row</span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">make_shared</span><span class="o">&lt;</span><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o">&lt;</span><span class="n">uchar</span><span class="o">&gt;&gt;</span><span class="p">(</span> <span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">buf</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="k">sizeof</span><span class="p">(</span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">field_count</span><span class="p">);</span> <span class="w"> </span><span class="n">memem_table</span><span class="o">-&gt;</span><span class="n">rows</span><span class="p">.</span><span class="n">push_back</span><span class="p">(</span><span class="n">row</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="p">}</span> </pre></div> <p>Which makes reading the row quite simple too!</p> <h4 id="read-row">Read row</h4><p>The only slight difference between reading and writing a row is that MySQL/MariaDB will tell us when the <code>SELECT</code> scan for a table starts.</p> <p>We'll use that opportunity to reset the <code>current_row</code> cursor and reset the <code>memem_table</code> field. Since, again, <code>handler</code> classes are only used once per query but they are reused for queries running at other times.</p> <div class="highlight"><pre><span></span><span class="kt">int</span><span class="w"> </span><span class="nf">ha_memem::rnd_init</span><span class="p">(</span><span class="kt">bool</span><span class="w"> </span><span class="n">scan</span><span class="p">)</span> <span class="p">{</span> <span class="w"> </span><span class="n">reset_memem_table</span><span class="p">();</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="p">}</span> <span class="kt">int</span><span class="w"> </span><span class="nf">ha_memem::rnd_next</span><span class="p">(</span><span class="n">uchar</span><span class="w"> </span><span class="o">*</span><span class="n">buf</span><span class="p">)</span> <span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">current_position</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">memem_table</span><span class="o">-&gt;</span><span class="n">rows</span><span class="p">.</span><span class="n">size</span><span class="p">())</span> <span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Reset the in-memory table to make logic errors more obvious.</span> <span class="w"> </span><span class="n">memem_table</span><span class="o">=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">HA_ERR_END_OF_FILE</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">assert</span><span class="p">(</span><span class="n">current_position</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">memem_table</span><span class="o">-&gt;</span><span class="n">rows</span><span class="p">.</span><span class="n">size</span><span class="p">());</span> <span class="w"> </span><span class="n">uchar</span><span class="w"> </span><span class="o">*</span><span class="n">ptr</span><span class="o">=</span><span class="w"> </span><span class="n">buf</span><span class="p">;</span> <span class="w"> </span><span class="o">*</span><span class="n">ptr</span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="n">ptr</span><span class="o">++</span><span class="p">;</span> <span class="w"> </span><span class="c1">// Rows internally are stored in the same format that MariaDB</span> <span class="w"> </span><span class="c1">// wants. So we can just copy them over.</span> <span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">shared_ptr</span><span class="o">&lt;</span><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o">&lt;</span><span class="n">uchar</span><span class="o">&gt;&gt;</span><span class="w"> </span><span class="n">row</span><span class="o">=</span><span class="w"> </span><span class="n">memem_table</span><span class="o">-&gt;</span><span class="n">rows</span><span class="p">[</span><span class="n">current_position</span><span class="p">];</span> <span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">copy</span><span class="p">(</span><span class="n">row</span><span class="o">-&gt;</span><span class="n">begin</span><span class="p">(),</span><span class="w"> </span><span class="n">row</span><span class="o">-&gt;</span><span class="n">end</span><span class="p">(),</span><span class="w"> </span><span class="n">ptr</span><span class="p">);</span> <span class="w"> </span><span class="n">current_position</span><span class="o">++</span><span class="p">;</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="p">}</span> </pre></div> <p>And we're done!</p> <h3 id="build-and-test">Build and test</h3><p>Go back into the <code>build</code> directory we created within the source tree root and rerun <code>make -j8</code>.</p> <p>Kill the server (you'll need to do something like <code>killall mariadbd</code> since the server doesn't respond to Ctrl-c). And restart it.</p> <p>For some reason this plugin doesn't need to be loaded. We can run <code>SHOW PLUGINS;</code> in the MariaDB CLI and we'll see it.</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>./build/client/mariadb<span class="w"> </span>--defaults-extra-file<span class="o">=</span>/home/phil/vendor/mariadb/my.cnf<span class="w"> </span>--database<span class="o">=</span><span class="nb">test</span> <span class="go">Reading table information for completion of table and column names</span> <span class="go">You can turn off this feature to get a quicker startup with -A</span> <span class="go">Welcome to the MariaDB monitor. Commands end with ; or \g.</span> <span class="go">Your MariaDB connection id is 5</span> <span class="go">Server version: 11.4.0-MariaDB-debug Source distribution</span> <span class="go">Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.</span> <span class="go">Type &#39;help;&#39; or &#39;\h&#39; for help. Type &#39;\c&#39; to clear the current input statement.</span> <span class="go">MariaDB [test]&gt; SHOW PLUGINS;</span> <span class="go">+-------------------------------+----------+--------------------+-----------------+---------+</span> <span class="go">| Name | Status | Type | Library | License |</span> <span class="go">+-------------------------------+----------+--------------------+-----------------+---------+</span> <span class="go">| binlog | ACTIVE | STORAGE ENGINE | NULL | GPL |</span> <span class="go">...</span> <span class="go">| MEMEM | ACTIVE | STORAGE ENGINE | NULL | GPL |</span> <span class="go">...</span> <span class="go">| BLACKHOLE | ACTIVE | STORAGE ENGINE | ha_blackhole.so | GPL |</span> <span class="go">+-------------------------------+----------+--------------------+-----------------+---------+</span> <span class="go">73 rows in set (0.012 sec)</span> </pre></div> <p>There we go! To create a table with it we need to set <code>ENGINE = MEMEM</code>. For example, <code>CREATE TABLE x (i INT) ENGINE = MEMEM</code>.</p> <p>Let's create a script to try out the <code>memem</code> engine, in <code>storage/memem/test.sql</code>.</p> <div class="highlight"><pre><span></span><span class="k">drop</span><span class="w"> </span><span class="k">table</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="k">exists</span><span class="w"> </span><span class="n">y</span><span class="p">;</span> <span class="k">drop</span><span class="w"> </span><span class="k">table</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="k">exists</span><span class="w"> </span><span class="n">z</span><span class="p">;</span> <span class="k">create</span><span class="w"> </span><span class="k">table</span><span class="w"> </span><span class="n">y</span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="nb">int</span><span class="p">,</span><span class="w"> </span><span class="n">j</span><span class="w"> </span><span class="nb">int</span><span class="p">)</span><span class="w"> </span><span class="n">engine</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">MEMEM</span><span class="p">;</span> <span class="k">insert</span><span class="w"> </span><span class="k">into</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="k">values</span><span class="w"> </span><span class="p">(</span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="mi">1029</span><span class="p">);</span> <span class="k">insert</span><span class="w"> </span><span class="k">into</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="k">values</span><span class="w"> </span><span class="p">(</span><span class="mi">92</span><span class="p">,</span><span class="w"> </span><span class="mi">8</span><span class="p">);</span> <span class="k">select</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="k">where</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">8</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">10</span><span class="p">;</span> <span class="k">create</span><span class="w"> </span><span class="k">table</span><span class="w"> </span><span class="n">z</span><span class="p">(</span><span class="n">a</span><span class="w"> </span><span class="nb">int</span><span class="p">)</span><span class="w"> </span><span class="n">engine</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">MEMEM</span><span class="p">;</span> <span class="k">insert</span><span class="w"> </span><span class="k">into</span><span class="w"> </span><span class="n">z</span><span class="w"> </span><span class="k">values</span><span class="w"> </span><span class="p">(</span><span class="mi">322</span><span class="p">);</span> <span class="k">insert</span><span class="w"> </span><span class="k">into</span><span class="w"> </span><span class="n">z</span><span class="w"> </span><span class="k">values</span><span class="w"> </span><span class="p">(</span><span class="mi">8</span><span class="p">);</span> <span class="k">select</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="n">z</span><span class="w"> </span><span class="k">where</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="mi">20</span><span class="p">;</span> </pre></div> <p>And run it.</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>./build/client/mariadb<span class="w"> </span>--defaults-extra-file<span class="o">=</span><span class="k">$(</span><span class="nb">pwd</span><span class="k">)</span>/my.cnf<span class="w"> </span>--database<span class="o">=</span><span class="nb">test</span><span class="w"> </span>--table<span class="w"> </span>--verbose<span class="w"> </span>&lt;<span class="w"> </span>storage/memem/test.sql <span class="go">--------------</span> <span class="go">drop table if exists y</span> <span class="go">--------------</span> <span class="go">--------------</span> <span class="go">drop table if exists z</span> <span class="go">--------------</span> <span class="go">--------------</span> <span class="go">create table y(i int, j int) engine = MEMEM</span> <span class="go">--------------</span> <span class="go">--------------</span> <span class="go">insert into y values (2, 1029)</span> <span class="go">--------------</span> <span class="go">--------------</span> <span class="go">insert into y values (92, 8)</span> <span class="go">--------------</span> <span class="go">--------------</span> <span class="go">select * from y where i + 8 = 10</span> <span class="go">--------------</span> <span class="go">+------+------+</span> <span class="go">| i | j |</span> <span class="go">+------+------+</span> <span class="go">| 2 | 1029 |</span> <span class="go">+------+------+</span> <span class="go">--------------</span> <span class="go">create table z(a int) engine = MEMEM</span> <span class="go">--------------</span> <span class="go">--------------</span> <span class="go">insert into z values (322)</span> <span class="go">--------------</span> <span class="go">--------------</span> <span class="go">insert into z values (8)</span> <span class="go">--------------</span> <span class="go">--------------</span> <span class="go">select * from z where a &gt; 20</span> <span class="go">--------------</span> <span class="go">+------+</span> <span class="go">| a |</span> <span class="go">+------+</span> <span class="go">| 322 |</span> <span class="go">+------+</span> </pre></div> <p>What you see there is the power of storage engines! It supports the full SQL language even while we implemented storage somewhere completely different than the default.</p> <h3 id="in-memory-is-boring">In-memory is boring</h3><p>Certainly, I'm getting bored doing the same project over and over again on different databases. However, it's minimal projects like this that make it super easy to then go and port the storage engine to something else.</p> <p>The goal here is to be minimal but meaningful. And I've accomplished that for myself at least!</p> <h3 id="on-chatgpt">On ChatGPT</h3><p>As I've <a href="https://notes.eatonphil.com/2023-11-19-exploring-a-postgres-query-plan.html#postscript:-on-chatgpt">written before</a>, this sort of exploration wouldn't be possible within the time frame I gave myself if it weren't for ChatGPT. Specifically, the paid tier GPT4.</p> <p>Neither the MySQL nor the MariaDB docs were so helpful that I could immediately figure out things like how to get the current table name within a scan (the <code>table</code> member of the <code>handler</code> class).</p> <p>With ChatGPT you can ask questions like: "In a MySQL C++ plugin, how do I get the name of the table from a <code>handler</code> class as a C string?". Sometimes it's right and sometime's it's not. But you can try out the code and if it builds it is at least somewhat correct!</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Wrote a post walking you through building a super minimal in-memory storage engine for MySQL/MariaDB in 218 lines of C++.<br><br>And took time again to reflect on the limitations of custom storage engines and how MySQL compares to Postgres internally here.<a href="https://t.co/nImUC36DPs">https://t.co/nImUC36DPs</a> <a href="https://t.co/1Oj2Lcua8O">pic.twitter.com/1Oj2Lcua8O</a></p>&mdash; Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1744822526088282587?ref_src=twsrc%5Etfw">January 9, 2024</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/2024-01-09-minimal-in-memory-storage-engine-for-mysql.htmlTue, 09 Jan 2024 00:00:00 +0000Make your own wayhttp://notes.eatonphil.com/2023-12-26-make-your-own-way.html<p>Over the years, I have repeatedly felt like I missed the timing for a meetup or an IRC group or social media in general. I'd go to a meetup every so often but I'd never make a meaningful connection with people, whereas everyone else knew each other. I'd join an IRC group and have difficulty catching up with what seemed to be the flow of conversation.</p> <p>I hadn't thought much about this until the pandemic when I started a <a href="https://eatonphil.com/discord.html">Discord group for software internals</a> and a virtual tech talk series called Hacker Nights. Since 2021 the Discord reached around 1,500 members and ~20 fairly active members. And the Meetup peaked at about 300 members with about 10-20 showing up each Meetup.</p> <p>After the pandemic receded I started an <a href="https://eatonphil.com/2023-ddia.html">NYC-based book club</a> over 2 months with about 5-8 active attendees. I ran a <a href="https://eatonphil.com/2023-10-wehack-postgres.html">virtual hack week on Discord</a> where I got ~100 devs into a temporary Discord server and we talked about Postgres internals and shared resources. Ultimately around 5 of us wrote blog posts and built new projects to explore Postgres.</p> <p>I started a <a href="https://eatonphil.com/2023-database-internals.html">virtual, async email book club</a> (that is still ongoing) with 300 devs from November 2023 to Feb 2024. There have been around 20 active members of the club. And each week the discussion is kicked off by one of the members, not myself.</p> <p>And I felt like there wasn't enough community opportunity for folks in systems programming in NYC so I started an <a href="https://eatonphil.com/nyc-systems-coffee-club.html">Manhattan-based Systems Coffee Club</a>. Around 15 people showed up to the first meeting and seemed even more excited about it than I was. (And I was excited!) We'll see where it goes from here.</p> <p>Organizing people to do this stuff doesn't come easy to me. I enjoy doing it to a degree, but every night before an event I have trouble sleeping. Worried about embarrassing myself. When the event happens though, and people are happy to be there to chat with everyone else, as they invariably have been, it makes it worthwhile.</p> <h3 id="everyone-want-community">Everyone want community</h3><p>Something I realized along the way is that people (maybe devs especially, I don't know) are looking for community. And when I have noticed there seems to be a missing flashpoint (a topic, a career focus, a book, etc.) for community, it's been pretty easy to get people together around it.</p> <h3 id="the-lifecycle-of-groups">The lifecycle of groups</h3><p>Groups, meetups, naturally live and die. Organizers get burnt out. I don't see this as a problem. It's just the way it is.</p> <p>At some point I'll get burnt out too. Or I'll get pickier. For example, I've been avoiding starting a systems programming meetup in NYC because I know it will be a big effort. So I've done lower effort groups like book clubs and coffee clubs.</p> <p>Don't worry about signing yourself up for indefinite work. Just do whatever you'd like to and don't feel bad if you have to stop. Someone else will eventually start the next great group, even if it comes in a different medium or flavor.</p> <h3 id="community-is-contagious">Community is contagious</h3><p>There are great communities out there that have inspired me.</p> <ul> <li>Aleksey Charapko's and Murat Demirbas's virtual <a href="https://charap.co/reading-group/">Distributed Systems Reading Group</a></li> <li>Alex Petrov's <a href="https://twitter.com/ifesdjeen">database paper reading group</a></li> <li>Andy Pavlo's <a href="https://db.cs.cmu.edu/seminar2023/">database interview series</a></li> <li>Paul Butler's <a href="https://browsertech.com/nyc">BrowserTech meetup</a></li> <li>Eric Zhang's <a href="https://twitter.com/ekzhang1/status/1700993939841716254">New York Systems Reading Group</a></li> </ul> <p>And this year I've been hearing about more.</p> <ul> <li>TU Munich students <a href="https://www.tumuchdata.club/">started a Student Database Group</a></li> <li>A group of developers <a href="https://twitter.com/Keleesssss/status/1720466270032691460">starting a Türkiye-language CS reading group</a></li> </ul> <p>There are yet a few more systems programming groups I've heard rumors about being started on the US West Coast and Stockholm.</p> <h3 id="do-whatever-you-want!">Do whatever you want!</h3><p>If you feel like you can't find the right group or that you don't fit in with existing groups or that you're missing a moment, there are surely other folks in the same boat. Waiting for a new group to join. You may be the catalyst.</p> <p>There's enormous potential for getting people together and doing something interesting and there isn't necessarily anyone telling you you should. Things you try may work and they may not. The more you try the more you'll learn what works and what doesn't. I've had a few years of <a href="https://notes.eatonphil.com/eight-years-of-tech-meetups.html">making mistakes organizing</a> to hone the sense.</p> <p>The only boring thing to do is to necessarily limit yourself to the sort of thing others have done before! Run a browser meetup instead of a React meetup. Interview hardware developers to teach software developers something. Get software developers with 20 years of experience in niche fields to teach the rest of us something. Read books beyond SICP or Clean Code. Try difficult programming projects.</p> <p>Whatever you want though, don't let me deter you. If you think something should exist, give it a shot!</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I used to struggle to get much out of meetups, couldn&#39;t pick up the flow of IRC. Some point I stopped trying solely to fit in. Instead to do what I thought was interesting. And to my surprise, folks were interested in coming along too!<br><br>Make your own way<a href="https://t.co/tVEa2ndiZm">https://t.co/tVEa2ndiZm</a> <a href="https://t.co/piWSsv14lj">pic.twitter.com/piWSsv14lj</a></p>&mdash; Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1740150745931149471?ref_src=twsrc%5Etfw">December 27, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/2023-12-26-make-your-own-way.htmlWed, 27 Dec 2023 00:00:00 +0000Exploring a Postgres query planhttp://notes.eatonphil.com/2023-11-19-exploring-a-postgres-query-plan.html<p><!-- -*- mode: markdown -*- --></p> <p>I learned this week that you can intercept and redirect Postgres query execution. You can hook into the execution layer so you're given a query plan and you get to decide what to do with it. What rows to return, if any, and where they come from.</p> <p>That's very interesting. So I started writing code to explore execution hooks. However, I got stuck interpreting the query plan. Either there's no query plan walking infrastructure or I just didn't find it.</p> <p>So this post is a digression into walking a Postgres query plan. By the end we'll be able to run <code>psql -c 'SELECT a FROM x WHERE a &gt; 1'</code> and reconstruct the entire SQL string from a Postgres <a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/include/executor/execdesc.h#L33"><code>QueryDesc</code></a> object, the query plan object Postgres builds.</p> <p>With that query plan walking infrastructure in place, we'll be in a good state to not just print out the query plan while walking it but instead to translate the query plan or evaluate it in our own way (e.g. over column-wise data, or <a href="https://github.com/citusdata/postgres_vectorization_test">vectorized execution over row-wise data</a>).</p> <p>Code for this project is <a href="https://github.com/eatonphil/pgexec">available on Github</a>.</p> <h3 id="what-is-a-query-plan?">What is a query plan?</h3><p>If you're familiar with parsers and compilers, a query plan is like an intermediate representation (IR) of a program. It is not as raw as an abstract syntax tree (AST); it has already been optimized.</p> <p>If that doesn't mean anything to you, think of a query plan as a structured and optimized version of the SQL query you submit to your database. It isn't text anymore. It is <a href="https://buttondown.email/jaffray/archive/why-are-query-plans-trees/">probably a tree</a>.</p> <p>Check out another Justin Jaffray <a href="https://justinjaffray.com/what-is-a-query-optimizer-for/">article on the subject</a> for more detail.</p> <h3 id="development-environment">Development environment</h3><p>Before we get to walking the query plan, let's set up the infrastructure to intercept query execution where we can eventually add in our print debugging of the query plan reconstructed as a SQL string.</p> <p>Once you've got <a href="https://wiki.postgresql.org/wiki/Compile_and_Install_from_source_code">Postgres build dependencies</a>, build and install a debug version of Postgres:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https://github.com/postgres/postgres<span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nb">cd</span><span class="w"> </span>postgres $<span class="w"> </span><span class="c1"># Make sure you&#39;re on the same commit I&#39;m on, just to be safe.</span> $<span class="w"> </span>git<span class="w"> </span>checkout<span class="w"> </span>b218fbb7a35fcf31539bfad12732038fe082a2eb $<span class="w"> </span>./configure<span class="w"> </span>--enable-cassert<span class="w"> </span>--enable-debug<span class="w"> </span><span class="nv">CFLAGS</span><span class="o">=</span><span class="s2">&quot;-ggdb -Og -g3 -fno-omit-frame-pointer&quot;</span> $<span class="w"> </span>make<span class="w"> </span>-j8 $<span class="w"> </span><span class="c1"># Installs to to /usr/local/pgsql/bin.</span> $<span class="w"> </span>sudo<span class="w"> </span>make<span class="w"> </span>install </pre></div> <p>I'm not going to cover Postgres extension infrastructure in detail. I wrote a bit about it in <a href="https://notes.eatonphil.com/2023-11-01-postgres-table-access-methods.html">my last post</a>. You need only read the first half, if at all; not the actual Table Access Method implementation.</p> <p>It will be even simpler in this post because Postgres hooks are extensions but not extensions you install with <code>CREATE EXTENSION</code>. If you want to read about the different kinds of Postgres extensions, check out <a href="https://tembo.io/blog/four-types-of-extensions/">this article</a> by Steven Miller.</p> <p>The minimum we need, aside from the hook code itself, is a Makefile that uses <a href="https://www.postgresql.org/docs/current/extend-pgxs.html">PGXS</a>:</p> <div class="highlight"><pre><span></span><span class="nv">MODULES</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>pgexec <span class="nv">PG_CONFIG</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>/usr/local/pgsql/bin/pg_config <span class="nv">PGXS</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">$(</span>shell<span class="w"> </span><span class="k">$(</span>PG_CONFIG<span class="k">)</span><span class="w"> </span>--pgxs<span class="k">)</span> <span class="cp">include $(PGXS)</span> </pre></div> <p>The <code>MODULES</code> value there corresponds to the C file we'll create shortly, <code>pgexec.c</code>.</p> <p class="note"> This <code>pg_config</code> binary path is important because you might have different versions of Postgres installed, for example by your package manager. It is important that the extension is built against the same version of Postgres which will load the extension. </p><p>Now we're ready for some hook code.</p> <h3 id="intercepting-query-execution">Intercepting query execution</h3><p>You can find the basic structure of a hook (and which hooks are available) in Tamika Nomara's <a href="https://github.com/taminomara/psql-hooks">unofficial Postgres hooks docs</a>.</p> <p class="note"> There is no official central place describing all hooks I could find in Postgres docs. Some hooks are described in various places throughout the docs though. </p><p>Based on that page, we can write a bare minimum hook that will intercept queries, log when we've done so, and pass control back to the standard execution path for the actual query. In <code>pgexec.c</code>:</p> <div class="highlight"><pre><span></span><span class="cp">#include</span><span class="w"> </span><span class="cpf">&quot;postgres.h&quot;</span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&quot;fmgr.h&quot;</span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&quot;executor/executor.h&quot;</span> <span class="n">PG_MODULE_MAGIC</span><span class="p">;</span> <span class="k">static</span><span class="w"> </span><span class="n">ExecutorRun_hook_type</span><span class="w"> </span><span class="n">prev_executor_run_hook</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span> <span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">print_plan</span><span class="p">(</span><span class="n">QueryDesc</span><span class="o">*</span><span class="w"> </span><span class="n">queryDesc</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">elog</span><span class="p">(</span><span class="n">LOG</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;[pgexec] HOOKED SUCCESSFULLY!&quot;</span><span class="p">);</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">pgexec_run_hook</span><span class="p">(</span> <span class="w"> </span><span class="n">QueryDesc</span><span class="o">*</span><span class="w"> </span><span class="n">queryDesc</span><span class="p">,</span> <span class="w"> </span><span class="n">ScanDirection</span><span class="w"> </span><span class="n">direction</span><span class="p">,</span> <span class="w"> </span><span class="n">uint64</span><span class="w"> </span><span class="n">count</span><span class="p">,</span> <span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">execute_once</span> <span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">print_plan</span><span class="p">(</span><span class="n">queryDesc</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">prev_executor_run_hook</span><span class="p">(</span><span class="n">queryDesc</span><span class="p">,</span><span class="w"> </span><span class="n">direction</span><span class="p">,</span><span class="w"> </span><span class="n">count</span><span class="p">,</span><span class="w"> </span><span class="n">execute_once</span><span class="p">);</span> <span class="p">}</span> <span class="kt">void</span><span class="w"> </span><span class="nf">_PG_init</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">prev_executor_run_hook</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ExecutorRun_hook</span><span class="p">;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">prev_executor_run_hook</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">NULL</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">prev_executor_run_hook</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">standard_ExecutorRun</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">ExecutorRun_hook</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pgexec_run_hook</span><span class="p">;</span> <span class="p">}</span> <span class="kt">void</span><span class="w"> </span><span class="nf">_PG_fini</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">ExecutorRun_hook</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">prev_executor_run_hook</span><span class="p">;</span> <span class="p">}</span> </pre></div> <p>You can discover the <code>standard_ExectutorRun</code> function from a quick <code>git grep ExecutorRun_hook</code> in the Postgres source which leads to <a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/backend/executor/execMain.c#L306">src/backend/executor/execMain.c#L306</a>:</p> <div class="highlight"><pre><span></span><span class="kt">void</span> <span class="nf">ExecutorRun</span><span class="p">(</span><span class="n">QueryDesc</span><span class="w"> </span><span class="o">*</span><span class="n">queryDesc</span><span class="p">,</span> <span class="w"> </span><span class="n">ScanDirection</span><span class="w"> </span><span class="n">direction</span><span class="p">,</span><span class="w"> </span><span class="n">uint64</span><span class="w"> </span><span class="n">count</span><span class="p">,</span> <span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">execute_once</span><span class="p">)</span> <span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">ExecutorRun_hook</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="n">ExecutorRun_hook</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">queryDesc</span><span class="p">,</span><span class="w"> </span><span class="n">direction</span><span class="p">,</span><span class="w"> </span><span class="n">count</span><span class="p">,</span><span class="w"> </span><span class="n">execute_once</span><span class="p">);</span> <span class="w"> </span><span class="k">else</span> <span class="w"> </span><span class="n">standard_ExecutorRun</span><span class="p">(</span><span class="n">queryDesc</span><span class="p">,</span><span class="w"> </span><span class="n">direction</span><span class="p">,</span><span class="w"> </span><span class="n">count</span><span class="p">,</span><span class="w"> </span><span class="n">execute_once</span><span class="p">);</span> <span class="p">}</span> </pre></div> <p>So our hook will just log and pass back execution to the existing execution hook. Let's build and install the extension.</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>make <span class="gp">$ </span>sudo<span class="w"> </span>make<span class="w"> </span>install </pre></div> <p>Now we'll create a new database and tell it to load the extension.</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>/usr/local/pgsql/bin/initdb<span class="w"> </span>test-db <span class="gp">$ </span><span class="nb">echo</span><span class="w"> </span><span class="s2">&quot;shared_preload_libraries = &#39;pgexec&#39;&quot;</span><span class="w"> </span>&gt;<span class="w"> </span>test-db/postgresql.conf </pre></div> <p class="note"> Remember, hooks are not <code>CREATE EXTENSION</code> extensions. As far as I can tell they can't be dynamically loaded (without some additional dynamic loading infrastructure one could potentially write). So every time you make a change you need to rebuild the extension, reinstall it, and restart the Postgres server. </p><p>And start the server in the foreground:</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>/usr/local/pgsql/bin/postgres<span class="w"> </span><span class="se">\</span> <span class="w"> </span>--config-file<span class="o">=</span><span class="k">$(</span><span class="nb">pwd</span><span class="k">)</span>/test-db/postgresql.conf<span class="w"> </span><span class="se">\</span> <span class="w"> </span>-D<span class="w"> </span><span class="k">$(</span><span class="nb">pwd</span><span class="k">)</span>/test-db <span class="go"> -k $(pwd)/test-db</span> <span class="go">2023-11-18 19:35:16.680 GMT [3215547] LOG: starting PostgreSQL 17devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 13.2.1 20230728 (Red Hat 13.2.1-1), 64-bit</span> <span class="go">2023-11-18 19:35:16.681 GMT [3215547] LOG: listening on IPv6 address &quot;::1&quot;, port 5432</span> <span class="go">2023-11-18 19:35:16.681 GMT [3215547] LOG: listening on IPv4 address &quot;127.0.0.1&quot;, port 5432</span> <span class="go">2023-11-18 19:35:16.681 GMT [3215547] LOG: listening on Unix socket &quot;/tmp/.s.PGSQL.5432&quot;</span> <span class="go">2023-11-18 19:35:16.682 GMT [3215550] LOG: database system was shut down at 2023-11-18 19:20:16 GMT</span> <span class="go">2023-11-18 19:35:16.684 GMT [3215547] LOG: database system is ready to accept connections</span> </pre></div> <p>Keep an eye on this foreground process since this is where <code>elog(LOG, ...)</code> calls will show up.</p> <p>Now in a new window, create a <code>test.sql</code> script that we can use to exercise the hook:</p> <div class="highlight"><pre><span></span><span class="k">DROP</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="k">EXISTS</span><span class="w"> </span><span class="n">x</span><span class="p">;</span> <span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="p">(</span><span class="n">a</span><span class="w"> </span><span class="nb">INT</span><span class="p">);</span> <span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="mi">309</span><span class="p">);</span> <span class="k">SELECT</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> </pre></div> <p>Run <code>psql</code> so we can trigger the hook:</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>/usr/local/pgsql/bin/psql<span class="w"> </span>-h<span class="w"> </span>localhost<span class="w"> </span>postgres<span class="w"> </span>-f<span class="w"> </span>test.sql <span class="go">DROP TABLE</span> <span class="go">CREATE TABLE</span> <span class="go">INSERT 0 1</span> <span class="go"> a</span> <span class="go">-----</span> <span class="go"> 309</span> <span class="gp gp-VirtualEnv">(1 row)</span> </pre></div> <p>And in the <code>postgres</code> foreground process you should see a log:</p> <div class="highlight"><pre><span></span><span class="go">2023-11-19 17:42:03.045 GMT [3242321] LOG: [pgexec] HOOKED SUCCESSFULLY!</span> <span class="go">2023-11-19 17:42:03.045 GMT [3242321] STATEMENT: INSERT INTO x VALUES (309);</span> <span class="go">2023-11-19 17:42:03.045 GMT [3242321] LOG: [pgexec] HOOKED SUCCESSFULLY!</span> <span class="go">2023-11-19 17:42:03.045 GMT [3242321] STATEMENT: SELECT a FROM x WHERE a &gt; 1;</span> </pre></div> <p>That's our hook! Interestingly only the <code>INSERT</code> and <code>SELECT</code> statements show up, not the <code>DROP</code> and <code>CREATE</code>.</p> <p>Now let's see if we can reconstruct the query text from that first argument, the <code>QueryDesc*</code> that <code>pgexec_run_hook</code> receives. And let's simplify things for ourselves and only worry about reconstructing a <code>SELECT</code> query.</p> <h3 id="<code>node</code>s-and-<code>datum</code>s"><code>Node</code>s and <code>Datum</code>s</h3><p>But first, let's talk about two fundemental ways data in Postgres (code) is organized.</p> <p>Postgres code is extremely dynamic and, maybe relatedly, fairly object-oriented. Almost every entity in Postgres is a <a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/include/nodes/nodes.h#L128"><code>Node</code></a>. While values in Postgres that are exposed to users of Postgres are <a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/include/postgres.h#L64"><code>Datum</code></a>s.</p> <p>Each node has a type, <a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/include/nodes/nodes.h#L26"><code>NodeTag</code></a>, that we can switch on to decide what to do. In contrast, <code>Datum</code> has no type. The type of the <code>Datum</code> must be known by context before using one of the transform functions like <a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/include/postgres.h#L90"><code>DatumGetBool</code></a> to retrieve a C value from a <code>Datum</code>.</p> <p>A table is a <code>Node</code>. A query plan is a <code>Node</code>. A sequential scan is a <code>Node</code>. A join is a <code>Node</code>. A literal in a query is a <code>Node</code>. The value for the literal in a query is a <code>Datum</code>.</p> <p>Here is how The Internals of PostgreSQL book <a href="https://www.interdb.jp/pg/pgsql03.html">visualizes</a> a query plan for example:</p> <p><img src="https://www.interdb.jp/pg/img/fig-3-04.png" alt="https://www.interdb.jp/pg/img/fig-3-04.png"></p> <p>Every box in that image is a <code>Node</code>.</p> <p>And all <code>Node</code>s in code I've seen share a common definition prefix like this:</p> <div class="highlight"><pre><span></span><span class="k">typedef</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">SomeThing</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">pg_node_attr</span><span class="p">(</span><span class="n">abstract</span><span class="p">)</span><span class="w"> </span><span class="c1">// If the node is indeed abstract in the OOP sense.</span> <span class="w"> </span><span class="n">NodeTag</span><span class="w"> </span><span class="n">type</span><span class="p">;</span> <span class="p">}</span> </pre></div> <p>Many <code>Node</code>s you'll see are abstract, like <code>Plan</code>. But by printing out <code>NodeTag</code> and checking the value printed in <code>src/include/nodes/nodetags.h</code>, you can find the concrete type of the <code>Node</code>.</p> <p><code>src/include/nodes/nodetags.h</code> is generated during a preprocessing step. (Don't look if <a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/backend/nodes/gen_node_support.pl">regex in Perl</a> worries you).</p> <p>We'll get back to <code>Node</code>s later.</p> <h3 id="what's-in-a-<code>querydesc</code>?">What's in a <code>QueryDesc</code>?</h3><p>Let's take a look at the <a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/include/executor/execdesc.h#L33"><code>QueryDesc</code></a> struct:</p> <div class="highlight"><pre><span></span><span class="k">typedef</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">QueryDesc</span> <span class="p">{</span> <span class="w"> </span><span class="cm">/* These fields are provided by CreateQueryDesc */</span> <span class="w"> </span><span class="n">CmdType</span><span class="w"> </span><span class="n">operation</span><span class="p">;</span><span class="w"> </span><span class="cm">/* CMD_SELECT, CMD_UPDATE, etc. */</span> <span class="w"> </span><span class="n">PlannedStmt</span><span class="w"> </span><span class="o">*</span><span class="n">plannedstmt</span><span class="p">;</span><span class="w"> </span><span class="cm">/* planner&#39;s output (could be utility, too) */</span> <span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">sourceText</span><span class="p">;</span><span class="w"> </span><span class="cm">/* source text of the query */</span> <span class="w"> </span><span class="n">Snapshot</span><span class="w"> </span><span class="n">snapshot</span><span class="p">;</span><span class="w"> </span><span class="cm">/* snapshot to use for query */</span> <span class="w"> </span><span class="n">Snapshot</span><span class="w"> </span><span class="n">crosscheck_snapshot</span><span class="p">;</span><span class="w"> </span><span class="cm">/* crosscheck for RI update/delete */</span> <span class="w"> </span><span class="n">DestReceiver</span><span class="w"> </span><span class="o">*</span><span class="n">dest</span><span class="p">;</span><span class="w"> </span><span class="cm">/* the destination for tuple output */</span> <span class="w"> </span><span class="n">ParamListInfo</span><span class="w"> </span><span class="n">params</span><span class="p">;</span><span class="w"> </span><span class="cm">/* param values being passed in */</span> <span class="w"> </span><span class="n">QueryEnvironment</span><span class="w"> </span><span class="o">*</span><span class="n">queryEnv</span><span class="p">;</span><span class="w"> </span><span class="cm">/* query environment passed in */</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">instrument_options</span><span class="p">;</span><span class="w"> </span><span class="cm">/* OR of InstrumentOption flags */</span> <span class="w"> </span><span class="cm">/* These fields are set by ExecutorStart */</span> <span class="w"> </span><span class="n">TupleDesc</span><span class="w"> </span><span class="n">tupDesc</span><span class="p">;</span><span class="w"> </span><span class="cm">/* descriptor for result tuples */</span> <span class="w"> </span><span class="n">EState</span><span class="w"> </span><span class="o">*</span><span class="n">estate</span><span class="p">;</span><span class="w"> </span><span class="cm">/* executor&#39;s query-wide state */</span> <span class="w"> </span><span class="n">PlanState</span><span class="w"> </span><span class="o">*</span><span class="n">planstate</span><span class="p">;</span><span class="w"> </span><span class="cm">/* tree of per-plan-node state */</span> <span class="w"> </span><span class="cm">/* This field is set by ExecutorRun */</span> <span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">already_executed</span><span class="p">;</span><span class="w"> </span><span class="cm">/* true if previously executed */</span> <span class="w"> </span><span class="cm">/* This is always set NULL by the core system, but plugins can change it */</span> <span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">Instrumentation</span><span class="w"> </span><span class="o">*</span><span class="n">totaltime</span><span class="p">;</span><span class="w"> </span><span class="cm">/* total time spent in ExecutorRun */</span> <span class="p">}</span><span class="w"> </span><span class="n">QueryDesc</span><span class="p">;</span> </pre></div> <p>The <a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/include/nodes/plannodes.h#L46"><code>PlannedStmt</code></a> field looks interesting. Let's take a look:</p> <div class="highlight"><pre><span></span><span class="k">typedef</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">PlannedStmt</span> <span class="p">{</span> <span class="w"> </span><span class="n">pg_node_attr</span><span class="p">(</span><span class="n">no_equal</span><span class="p">,</span><span class="w"> </span><span class="n">no_query_jumble</span><span class="p">)</span> <span class="w"> </span><span class="n">NodeTag</span><span class="w"> </span><span class="n">type</span><span class="p">;</span> <span class="w"> </span><span class="n">CmdType</span><span class="w"> </span><span class="n">commandType</span><span class="p">;</span><span class="w"> </span><span class="cm">/* select|insert|update|delete|merge|utility */</span> <span class="w"> </span><span class="n">uint64</span><span class="w"> </span><span class="n">queryId</span><span class="p">;</span><span class="w"> </span><span class="cm">/* query identifier (copied from Query) */</span> <span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">hasReturning</span><span class="p">;</span><span class="w"> </span><span class="cm">/* is it insert|update|delete RETURNING? */</span> <span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">hasModifyingCTE</span><span class="p">;</span><span class="w"> </span><span class="cm">/* has insert|update|delete in WITH? */</span> <span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">canSetTag</span><span class="p">;</span><span class="w"> </span><span class="cm">/* do I set the command result tag? */</span> <span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">transientPlan</span><span class="p">;</span><span class="w"> </span><span class="cm">/* redo plan when TransactionXmin changes? */</span> <span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">dependsOnRole</span><span class="p">;</span><span class="w"> </span><span class="cm">/* is plan specific to current role? */</span> <span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">parallelModeNeeded</span><span class="p">;</span><span class="w"> </span><span class="cm">/* parallel mode required to execute? */</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">jitFlags</span><span class="p">;</span><span class="w"> </span><span class="cm">/* which forms of JIT should be performed */</span> <span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">Plan</span><span class="w"> </span><span class="o">*</span><span class="n">planTree</span><span class="p">;</span><span class="w"> </span><span class="cm">/* tree of Plan nodes */</span> <span class="w"> </span><span class="n">List</span><span class="w"> </span><span class="o">*</span><span class="n">rtable</span><span class="p">;</span><span class="w"> </span><span class="cm">/* list of RangeTblEntry nodes */</span> <span class="w"> </span><span class="n">List</span><span class="w"> </span><span class="o">*</span><span class="n">permInfos</span><span class="p">;</span><span class="w"> </span><span class="cm">/* list of RTEPermissionInfo nodes for rtable</span> <span class="cm"> * entries needing one */</span> <span class="w"> </span><span class="cm">/* rtable indexes of target relations for INSERT/UPDATE/DELETE/MERGE */</span> <span class="w"> </span><span class="n">List</span><span class="w"> </span><span class="o">*</span><span class="n">resultRelations</span><span class="p">;</span><span class="w"> </span><span class="cm">/* integer list of RT indexes, or NIL */</span> <span class="w"> </span><span class="n">List</span><span class="w"> </span><span class="o">*</span><span class="n">appendRelations</span><span class="p">;</span><span class="w"> </span><span class="cm">/* list of AppendRelInfo nodes */</span> <span class="w"> </span><span class="n">List</span><span class="w"> </span><span class="o">*</span><span class="n">subplans</span><span class="p">;</span><span class="w"> </span><span class="cm">/* Plan trees for SubPlan expressions; note</span> <span class="cm"> * that some could be NULL */</span> <span class="w"> </span><span class="n">Bitmapset</span><span class="w"> </span><span class="o">*</span><span class="n">rewindPlanIDs</span><span class="p">;</span><span class="w"> </span><span class="cm">/* indices of subplans that require REWIND */</span> <span class="w"> </span><span class="n">List</span><span class="w"> </span><span class="o">*</span><span class="n">rowMarks</span><span class="p">;</span><span class="w"> </span><span class="cm">/* a list of PlanRowMark&#39;s */</span> <span class="w"> </span><span class="n">List</span><span class="w"> </span><span class="o">*</span><span class="n">relationOids</span><span class="p">;</span><span class="w"> </span><span class="cm">/* OIDs of relations the plan depends on */</span> <span class="w"> </span><span class="n">List</span><span class="w"> </span><span class="o">*</span><span class="n">invalItems</span><span class="p">;</span><span class="w"> </span><span class="cm">/* other dependencies, as PlanInvalItems */</span> <span class="w"> </span><span class="n">List</span><span class="w"> </span><span class="o">*</span><span class="n">paramExecTypes</span><span class="p">;</span><span class="w"> </span><span class="cm">/* type OIDs for PARAM_EXEC Params */</span> <span class="w"> </span><span class="n">Node</span><span class="w"> </span><span class="o">*</span><span class="n">utilityStmt</span><span class="p">;</span><span class="w"> </span><span class="cm">/* non-null if this is utility stmt */</span> <span class="w"> </span><span class="cm">/* statement location in source string (copied from Query) */</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">stmt_location</span><span class="p">;</span><span class="w"> </span><span class="cm">/* start location, or -1 if unknown */</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">stmt_len</span><span class="p">;</span><span class="w"> </span><span class="cm">/* length in bytes; 0 means &quot;rest of string&quot; */</span> <span class="p">}</span><span class="w"> </span><span class="n">PlannedStmt</span><span class="p">;</span> </pre></div> <p>The <code>struct Plan* planTree</code> field in there looks like what we'd want. But <a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/include/nodes/plannodes.h#L119"><code>Plan</code></a> is abstract:</p> <div class="highlight"><pre><span></span><span class="k">typedef</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">Plan</span> <span class="p">{</span> <span class="w"> </span><span class="n">pg_node_attr</span><span class="p">(</span><span class="n">abstract</span><span class="p">,</span><span class="w"> </span><span class="n">no_equal</span><span class="p">,</span><span class="w"> </span><span class="n">no_query_jumble</span><span class="p">)</span> <span class="w"> </span><span class="n">NodeTag</span><span class="w"> </span><span class="n">type</span><span class="p">;</span> </pre></div> <p>So let's try printing out the <code>planTree-&gt;type</code> field and find the <code>Node</code> it is concretely. In <code>pgexec.c</code> change the definition of <code>print_plan</code>:</p> <div class="highlight"><pre><span></span><span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">print_plan</span><span class="p">(</span><span class="n">QueryDesc</span><span class="o">*</span><span class="w"> </span><span class="n">queryDesc</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">elog</span><span class="p">(</span><span class="n">LOG</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;[pgexec] HOOKED SUCCESSFULLY! %d&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">queryDesc</span><span class="o">-&gt;</span><span class="n">plannedstmt</span><span class="o">-&gt;</span><span class="n">planTree</span><span class="o">-&gt;</span><span class="n">type</span><span class="p">);</span> <span class="p">}</span> </pre></div> <p>Rebuild and reinstall the extension, and restart Postgres:</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>make <span class="gp">$ </span>sudo<span class="w"> </span>make<span class="w"> </span>install <span class="gp">$ </span>/usr/local/pgsql/bin/postgres<span class="w"> </span><span class="se">\</span> <span class="w"> </span>--config-file<span class="o">=</span><span class="k">$(</span><span class="nb">pwd</span><span class="k">)</span>/test-db/postgresql.conf<span class="w"> </span><span class="se">\</span> <span class="w"> </span>-D<span class="w"> </span><span class="k">$(</span><span class="nb">pwd</span><span class="k">)</span>/test-db <span class="go"> -k $(pwd)/test-db</span> <span class="go">2023-11-18 19:35:16.680 GMT [3215547] LOG: starting PostgreSQL 17devel on x86_64-pc-linux-gnu, compiled by gcc (GCC) 13.2.1 20230728 (Red Hat 13.2.1-1), 64-bit</span> <span class="go">2023-11-18 19:35:16.681 GMT [3215547] LOG: listening on IPv6 address &quot;::1&quot;, port 5432</span> <span class="go">2023-11-18 19:35:16.681 GMT [3215547] LOG: listening on IPv4 address &quot;127.0.0.1&quot;, port 5432</span> <span class="go">2023-11-18 19:35:16.681 GMT [3215547] LOG: listening on Unix socket &quot;/tmp/.s.PGSQL.5432&quot;</span> <span class="go">2023-11-18 19:35:16.682 GMT [3215550] LOG: database system was shut down at 2023-11-18 19:20:16 GMT</span> <span class="go">2023-11-18 19:35:16.684 GMT [3215547] LOG: database system is ready to accept connections</span> </pre></div> <p>And in another window run <code>psql</code>:</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>/usr/local/pgsql/bin/psql<span class="w"> </span>-h<span class="w"> </span>localhost<span class="w"> </span>postgres<span class="w"> </span>-f<span class="w"> </span>test.sql </pre></div> <p>And check the logs from the <code>postgres</code> process we just started and you should notice:</p> <div class="highlight"><pre><span></span><span class="go">2023-11-19 17:46:18.834 GMT [3242495] LOG: [pgexec] HOOKED SUCCESSFULLY! 322</span> <span class="go">2023-11-19 17:46:18.834 GMT [3242495] STATEMENT: SELECT a FROM x WHERE a &gt; 1;</span> </pre></div> <p>So <code>322</code> is the <code>NodeTag</code> for the <code>Plan</code>. If we look that up in Postgres's <code>src/include/nodes/nodetags.h</code> (remember, this is generated after <code>./configure &amp;&amp; make</code> so I can't link you to it):</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>grep<span class="w"> </span><span class="s1">&#39; = 322&#39;</span><span class="w"> </span>src/include/nodes/nodetags.h <span class="go"> T_SeqScan = 322,</span> </pre></div> <p>Hey, that makes sense! A <code>SELECT</code> without any indexes definitely sounds like a sequential scan!</p> <h3 id="walking-a-sequential-scan">Walking a sequential scan</h3><p>Let's take a look at the <a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/include/nodes/plannodes.h#L394"><code>SeqScan</code></a> struct:</p> <div class="highlight"><pre><span></span><span class="k">typedef</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">SeqScan</span> <span class="p">{</span> <span class="w"> </span><span class="n">Scan</span><span class="w"> </span><span class="n">scan</span><span class="p">;</span> <span class="p">}</span><span class="w"> </span><span class="n">SeqScan</span><span class="p">;</span> </pre></div> <p>Ok, that's not very interesting. Let's look at <a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/include/nodes/plannodes.h#L382"><code>Scan</code></a> then:</p> <div class="highlight"><pre><span></span><span class="k">typedef</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">Scan</span> <span class="p">{</span> <span class="w"> </span><span class="n">pg_node_attr</span><span class="p">(</span><span class="n">abstract</span><span class="p">)</span> <span class="w"> </span><span class="n">Plan</span><span class="w"> </span><span class="n">plan</span><span class="p">;</span> <span class="w"> </span><span class="n">Index</span><span class="w"> </span><span class="n">scanrelid</span><span class="p">;</span><span class="w"> </span><span class="cm">/* relid is index into the range table */</span> <span class="p">}</span><span class="w"> </span><span class="n">Scan</span><span class="p">;</span> </pre></div> <p>That's interesting! <code>scanrelid</code> represents the table we're scanning. I don't know what "range table" means exactly. But there was a field on the <code>PlannedStmt</code> called <code>rtable</code> that seems relevant.</p> <p><code>rtable</code> was described as a <a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/include/nodes/pg_list.h#L53"><code>List</code></a> of <a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/include/nodes/parsenodes.h#L1019"><code>RangeTblEntry</code></a> nodes. And browsing around the file where <code>List</code> is defined we can see some nice methods for working with <code>List</code>s, like <code>list_length()</code>.</p> <p>Let's print out the <code>scanrelid</code> and let's check out the length of the <code>rtable</code> and see if it's filled out. Let's also restrict our <code>print_plan</code> code to only look at <code>SeqScan</code> nodes. In <code>pgexec.c</code>:</p> <div class="highlight"><pre><span></span><span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">print_plan</span><span class="p">(</span><span class="n">QueryDesc</span><span class="o">*</span><span class="w"> </span><span class="n">queryDesc</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">SeqScan</span><span class="o">*</span><span class="w"> </span><span class="n">scan</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span> <span class="w"> </span><span class="n">Plan</span><span class="o">*</span><span class="w"> </span><span class="n">plan</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">queryDesc</span><span class="o">-&gt;</span><span class="n">plannedstmt</span><span class="o">-&gt;</span><span class="n">planTree</span><span class="p">;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">plan</span><span class="o">-&gt;</span><span class="n">type</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">T_SeqScan</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">elog</span><span class="p">(</span><span class="n">LOG</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;[pgexec] Unsupported plan type.&quot;</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">scan</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">SeqScan</span><span class="o">*</span><span class="p">)</span><span class="n">plan</span><span class="p">;</span> <span class="w"> </span><span class="n">elog</span><span class="p">(</span><span class="n">LOG</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;[pgexec] relid: %d, rtable length: %d&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">scan</span><span class="o">-&gt;</span><span class="n">scan</span><span class="p">.</span><span class="n">scanrelid</span><span class="p">,</span><span class="w"> </span><span class="n">list_length</span><span class="p">(</span><span class="n">queryDesc</span><span class="o">-&gt;</span><span class="n">plannedstmt</span><span class="o">-&gt;</span><span class="n">rtable</span><span class="p">));</span> <span class="p">}</span> </pre></div> <p>Rebuild and reinstall the extension, and restart Postgres. (You can find the instructions for this above if you've forgotten.) Re-run the <code>test.sql</code> script. And check the Postgres server logs. You should see:</p> <div class="highlight"><pre><span></span><span class="go">2023-11-19 18:00:34.184 GMT [3244438] LOG: [pgexec] relid: 1, rtable length: 1</span> <span class="go">2023-11-19 18:00:34.184 GMT [3244438] STATEMENT: SELECT a FROM x WHERE a &gt; 1;</span> </pre></div> <p>Awesome! So <code>rtable</code> does have data in it. There's only one table in this query so its length makes sense to be <code>1</code>. The <code>scanrelid</code> being <code>1</code> also though is weird. Let's fetch the nth value from the <code>rtable</code> list using <code>scanrelid-1</code> as the index.</p> <p>For the <a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/include/nodes/parsenodes.h#L1019"><code>RangeTblEntry</code></a> itself, let's take a look:</p> <div class="highlight"><pre><span></span><span class="k">typedef</span><span class="w"> </span><span class="k">enum</span><span class="w"> </span><span class="n">RTEKind</span> <span class="p">{</span> <span class="w"> </span><span class="n">RTE_RELATION</span><span class="p">,</span><span class="w"> </span><span class="cm">/* ordinary relation reference */</span> <span class="w"> </span><span class="n">RTE_SUBQUERY</span><span class="p">,</span><span class="w"> </span><span class="cm">/* subquery in FROM */</span> <span class="w"> </span><span class="n">RTE_JOIN</span><span class="p">,</span><span class="w"> </span><span class="cm">/* join */</span> <span class="w"> </span><span class="n">RTE_FUNCTION</span><span class="p">,</span><span class="w"> </span><span class="cm">/* function in FROM */</span> <span class="w"> </span><span class="n">RTE_TABLEFUNC</span><span class="p">,</span><span class="w"> </span><span class="cm">/* TableFunc(.., column list) */</span> <span class="w"> </span><span class="n">RTE_VALUES</span><span class="p">,</span><span class="w"> </span><span class="cm">/* VALUES (&lt;exprlist&gt;), (&lt;exprlist&gt;), ... */</span> <span class="w"> </span><span class="n">RTE_CTE</span><span class="p">,</span><span class="w"> </span><span class="cm">/* common table expr (WITH list element) */</span> <span class="w"> </span><span class="n">RTE_NAMEDTUPLESTORE</span><span class="p">,</span><span class="w"> </span><span class="cm">/* tuplestore, e.g. for AFTER triggers */</span> <span class="w"> </span><span class="n">RTE_RESULT</span><span class="p">,</span><span class="w"> </span><span class="cm">/* RTE represents an empty FROM clause; such</span> <span class="cm"> * RTEs are added by the planner, they&#39;re not</span> <span class="cm"> * present during parsing or rewriting */</span> <span class="p">}</span><span class="w"> </span><span class="n">RTEKind</span><span class="p">;</span> <span class="k">typedef</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">RangeTblEntry</span> <span class="p">{</span> <span class="w"> </span><span class="n">pg_node_attr</span><span class="p">(</span><span class="n">custom_read_write</span><span class="p">,</span><span class="w"> </span><span class="n">custom_query_jumble</span><span class="p">)</span> <span class="w"> </span><span class="n">NodeTag</span><span class="w"> </span><span class="n">type</span><span class="p">;</span> <span class="w"> </span><span class="n">RTEKind</span><span class="w"> </span><span class="n">rtekind</span><span class="p">;</span><span class="w"> </span><span class="cm">/* see above */</span> <span class="w"> </span><span class="cm">/*</span> <span class="cm"> * XXX the fields applicable to only some rte kinds should be merged into</span> <span class="cm"> * a union. I didn&#39;t do this yet because the diffs would impact a lot of</span> <span class="cm"> * code that is being actively worked on. FIXME someday.</span> <span class="cm"> */</span> <span class="w"> </span><span class="cm">/*</span> <span class="cm"> * Fields valid for a plain relation RTE (else zero):</span> <span class="cm"> *</span> <span class="cm"> * rellockmode is really LOCKMODE, but it&#39;s declared int to avoid having</span> <span class="cm"> * to include lock-related headers here. It must be RowExclusiveLock if</span> <span class="cm"> * the RTE is an INSERT/UPDATE/DELETE/MERGE target, else RowShareLock if</span> <span class="cm"> * the RTE is a SELECT FOR UPDATE/FOR SHARE target, else AccessShareLock.</span> <span class="cm"> *</span> <span class="cm"> * Note: in some cases, rule expansion may result in RTEs that are marked</span> <span class="cm"> * with RowExclusiveLock even though they are not the target of the</span> <span class="cm"> * current query; this happens if a DO ALSO rule simply scans the original</span> <span class="cm"> * target table. We leave such RTEs with their original lockmode so as to</span> <span class="cm"> * avoid getting an additional, lesser lock.</span> <span class="cm"> *</span> <span class="cm"> * perminfoindex is 1-based index of the RTEPermissionInfo belonging to</span> <span class="cm"> * this RTE in the containing struct&#39;s list of same; 0 if permissions need</span> <span class="cm"> * not be checked for this RTE.</span> <span class="cm"> *</span> <span class="cm"> * As a special case, relid, relkind, rellockmode, and perminfoindex can</span> <span class="cm"> * also be set (nonzero) in an RTE_SUBQUERY RTE. This occurs when we</span> <span class="cm"> * convert an RTE_RELATION RTE naming a view into an RTE_SUBQUERY</span> <span class="cm"> * containing the view&#39;s query. We still need to perform run-time locking</span> <span class="cm"> * and permission checks on the view, even though it&#39;s not directly used</span> <span class="cm"> * in the query anymore, and the most expedient way to do that is to</span> <span class="cm"> * retain these fields from the old state of the RTE.</span> <span class="cm"> *</span> <span class="cm"> * As a special case, RTE_NAMEDTUPLESTORE can also set relid to indicate</span> <span class="cm"> * that the tuple format of the tuplestore is the same as the referenced</span> <span class="cm"> * relation. This allows plans referencing AFTER trigger transition</span> <span class="cm"> * tables to be invalidated if the underlying table is altered.</span> <span class="cm"> */</span> <span class="w"> </span><span class="n">Oid</span><span class="w"> </span><span class="n">relid</span><span class="p">;</span><span class="w"> </span><span class="cm">/* OID of the relation */</span> <span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="n">relkind</span><span class="p">;</span><span class="w"> </span><span class="cm">/* relation kind (see pg_class.relkind) */</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">rellockmode</span><span class="p">;</span><span class="w"> </span><span class="cm">/* lock level that query requires on the rel */</span> <span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">TableSampleClause</span><span class="w"> </span><span class="o">*</span><span class="n">tablesample</span><span class="p">;</span><span class="w"> </span><span class="cm">/* sampling info, or NULL */</span> <span class="w"> </span><span class="n">Index</span><span class="w"> </span><span class="n">perminfoindex</span><span class="p">;</span> </pre></div> <p>In <code>SELECT a FROM x</code>, <code>x</code> should be a plain relation RTE (to use the terminology there). So we can add a guard that validates that. But we don't get a <code>Relation</code>. (You might remember from my <a href="https://notes.eatonphil.com/2023-11-01-postgres-table-access-methods.html">previous post</a> that <code>Relation</code> is where we can finally see the table name.)</p> <p>We get an <code>Oid</code> for the <code>Relation</code>. So we need to find a way to lookup a <code>Relation</code> from an <code>Oid</code>. And by grepping around in Postgres (or via judicious use of ChatGPT, I confess), we can notice <a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/backend/utils/cache/relcache.c#L2056"><code>RelationIdGetRelation</code></a> that takes an <code>Oid</code> and returns a <code>Relation</code>. Notice also that the comment says we should close the relation when we're done with <code>RelationClose</code>.</p> <p>So putting it altogether (and again, reusing some code from that previous post), we can print out the table name.</p> <div class="highlight"><pre><span></span><span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">print_plan</span><span class="p">(</span><span class="n">QueryDesc</span><span class="o">*</span><span class="w"> </span><span class="n">queryDesc</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">SeqScan</span><span class="o">*</span><span class="w"> </span><span class="n">scan</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span> <span class="w"> </span><span class="n">RangeTblEntry</span><span class="o">*</span><span class="w"> </span><span class="n">rte</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span> <span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">relation</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{};</span> <span class="w"> </span><span class="kt">char</span><span class="o">*</span><span class="w"> </span><span class="n">tablename</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span> <span class="w"> </span><span class="n">Plan</span><span class="o">*</span><span class="w"> </span><span class="n">plan</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">queryDesc</span><span class="o">-&gt;</span><span class="n">plannedstmt</span><span class="o">-&gt;</span><span class="n">planTree</span><span class="p">;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">plan</span><span class="o">-&gt;</span><span class="n">type</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">T_SeqScan</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">elog</span><span class="p">(</span><span class="n">LOG</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;[pgexec] Unsupported plan type.&quot;</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">scan</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">SeqScan</span><span class="o">*</span><span class="p">)</span><span class="n">plan</span><span class="p">;</span> <span class="w"> </span><span class="n">rte</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">list_nth</span><span class="p">(</span><span class="n">queryDesc</span><span class="o">-&gt;</span><span class="n">plannedstmt</span><span class="o">-&gt;</span><span class="n">rtable</span><span class="p">,</span><span class="w"> </span><span class="n">scan</span><span class="o">-&gt;</span><span class="n">scan</span><span class="p">.</span><span class="n">scanrelid</span><span class="mi">-1</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">rte</span><span class="o">-&gt;</span><span class="n">rtekind</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">RTE_RELATION</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">elog</span><span class="p">(</span><span class="n">LOG</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;[pgexec] Unsupported FROM type: %d.&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">rte</span><span class="o">-&gt;</span><span class="n">rtekind</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">relation</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">RelationIdGetRelation</span><span class="p">(</span><span class="n">rte</span><span class="o">-&gt;</span><span class="n">relid</span><span class="p">);</span> <span class="w"> </span><span class="n">tablename</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">NameStr</span><span class="p">(</span><span class="n">relation</span><span class="o">-&gt;</span><span class="n">rd_rel</span><span class="o">-&gt;</span><span class="n">relname</span><span class="p">);</span> <span class="w"> </span><span class="n">elog</span><span class="p">(</span><span class="n">LOG</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;[pgexec] SELECT [todo] FROM %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">tablename</span><span class="p">);</span> <span class="w"> </span><span class="n">RelationClose</span><span class="p">(</span><span class="n">relation</span><span class="p">);</span> <span class="p">}</span> </pre></div> <p>You'll also need to add a new <code>#include</code> for <code>utils/rel.h</code>.</p> <p>Rebuild and reinstall the extension, and restart Postgres. Re-run the <code>test.sql</code> script. Check the Postgres server logs and you should see:</p> <div class="highlight"><pre><span></span><span class="go">2023-11-19 18:36:03.986 GMT [3246777] LOG: [pgexec] SELECT [todo] FROM x</span> <span class="go">2023-11-19 18:36:03.986 GMT [3246777] STATEMENT: SELECT a FROM x WHERE a &gt; 1;</span> </pre></div> <p>Fantastic! Before we get into walking the <code>SELECT</code> columns and the (optional) <code>WHERE</code> clause, let's do some quick refactoring.</p> <h3 id="a-string-builder">A string builder</h3><p>Let's add a little string builder library so we can emit a single string we build up to a single <code>elog()</code> call.</p> <p>I wrote this ahead of time and won't explain it here since the details aren't relevant.</p> <p>Just copy this and paste near the top of <code>pgexec.c</code>:</p> <div class="highlight"><pre><span></span><span class="k">typedef</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kt">char</span><span class="o">*</span><span class="w"> </span><span class="n">mem</span><span class="p">;</span> <span class="w"> </span><span class="kt">size_t</span><span class="w"> </span><span class="n">len</span><span class="p">;</span> <span class="w"> </span><span class="kt">size_t</span><span class="w"> </span><span class="n">offset</span><span class="p">;</span> <span class="p">}</span><span class="w"> </span><span class="n">PGExec_Buffer</span><span class="p">;</span> <span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_init</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">buf</span><span class="o">-&gt;</span><span class="n">offset</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="n">buf</span><span class="o">-&gt;</span><span class="n">len</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">8</span><span class="p">;</span> <span class="w"> </span><span class="n">buf</span><span class="o">-&gt;</span><span class="n">mem</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="kt">char</span><span class="o">*</span><span class="p">)</span><span class="n">malloc</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="kt">char</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="o">-&gt;</span><span class="n">len</span><span class="p">);</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_resize_to_fit_additional</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="kt">size_t</span><span class="w"> </span><span class="n">additional</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kt">char</span><span class="o">*</span><span class="w"> </span><span class="n">new</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{};</span> <span class="w"> </span><span class="kt">size_t</span><span class="w"> </span><span class="n">newsize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="n">Assert</span><span class="p">(</span><span class="n">additional</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">buf</span><span class="o">-&gt;</span><span class="n">offset</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">additional</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">buf</span><span class="o">-&gt;</span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">newsize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">buf</span><span class="o">-&gt;</span><span class="n">offset</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">additional</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">2</span><span class="p">;</span> <span class="w"> </span><span class="n">new</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="kt">char</span><span class="o">*</span><span class="p">)</span><span class="n">malloc</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="kt">char</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">newsize</span><span class="p">);</span> <span class="w"> </span><span class="n">Assert</span><span class="p">(</span><span class="n">new</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">);</span> <span class="w"> </span><span class="n">memcpy</span><span class="p">(</span><span class="n">new</span><span class="p">,</span><span class="w"> </span><span class="n">buf</span><span class="o">-&gt;</span><span class="n">mem</span><span class="p">,</span><span class="w"> </span><span class="n">buf</span><span class="o">-&gt;</span><span class="n">len</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">sizeof</span><span class="p">(</span><span class="kt">char</span><span class="p">));</span> <span class="w"> </span><span class="n">free</span><span class="p">(</span><span class="n">buf</span><span class="o">-&gt;</span><span class="n">mem</span><span class="p">);</span> <span class="w"> </span><span class="n">buf</span><span class="o">-&gt;</span><span class="n">len</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">newsize</span><span class="p">;</span> <span class="w"> </span><span class="n">buf</span><span class="o">-&gt;</span><span class="n">mem</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">new</span><span class="p">;</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_append</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="kt">size_t</span><span class="p">);</span> <span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_appendz</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="o">*</span><span class="w"> </span><span class="n">c</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">buffer_append</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">c</span><span class="p">,</span><span class="w"> </span><span class="n">strlen</span><span class="p">(</span><span class="n">c</span><span class="p">));</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_append</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="o">*</span><span class="w"> </span><span class="n">c</span><span class="p">,</span><span class="w"> </span><span class="kt">size_t</span><span class="w"> </span><span class="n">chars</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">buffer_resize_to_fit_additional</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">chars</span><span class="p">);</span> <span class="w"> </span><span class="n">memcpy</span><span class="p">(</span><span class="n">buf</span><span class="o">-&gt;</span><span class="n">mem</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">buf</span><span class="o">-&gt;</span><span class="n">offset</span><span class="p">,</span><span class="w"> </span><span class="n">c</span><span class="p">,</span><span class="w"> </span><span class="n">chars</span><span class="p">);</span> <span class="w"> </span><span class="n">buf</span><span class="o">-&gt;</span><span class="n">offset</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">chars</span><span class="p">;</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_appendf</span><span class="p">(</span> <span class="w"> </span><span class="n">PGExec_Buffer</span><span class="w"> </span><span class="o">*</span><span class="p">,</span> <span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="kt">char</span><span class="o">*</span><span class="w"> </span><span class="kr">restrict</span><span class="p">,</span> <span class="w"> </span><span class="p">...</span> <span class="p">)</span><span class="w"> </span><span class="n">__attribute__</span><span class="w"> </span><span class="p">((</span><span class="n">format</span><span class="w"> </span><span class="p">(</span><span class="n">gnu_printf</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="mi">3</span><span class="p">)));</span> <span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_appendf</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="w"> </span><span class="o">*</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="kt">char</span><span class="o">*</span><span class="w"> </span><span class="kr">restrict</span><span class="w"> </span><span class="n">fmt</span><span class="p">,</span><span class="w"> </span><span class="p">...)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// First figure out how long the result will be.</span> <span class="w"> </span><span class="kt">size_t</span><span class="w"> </span><span class="n">chars</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="kt">va_list</span><span class="w"> </span><span class="n">arglist</span><span class="p">;</span> <span class="w"> </span><span class="n">va_start</span><span class="p">(</span><span class="n">arglist</span><span class="p">,</span><span class="w"> </span><span class="n">fmt</span><span class="p">);</span> <span class="w"> </span><span class="n">chars</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">vsnprintf</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">fmt</span><span class="p">,</span><span class="w"> </span><span class="n">arglist</span><span class="p">);</span> <span class="w"> </span><span class="n">Assert</span><span class="p">(</span><span class="n">chars</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span><span class="w"> </span><span class="c1">// TODO: error handling.</span> <span class="w"> </span><span class="c1">// Resize to fit result.</span> <span class="w"> </span><span class="n">buffer_resize_to_fit_additional</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">chars</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Actually do the printf into buf.</span> <span class="w"> </span><span class="n">va_end</span><span class="p">(</span><span class="n">arglist</span><span class="p">);</span> <span class="w"> </span><span class="n">va_start</span><span class="p">(</span><span class="n">arglist</span><span class="p">,</span><span class="w"> </span><span class="n">fmt</span><span class="p">);</span> <span class="w"> </span><span class="n">chars</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">vsprintf</span><span class="p">(</span><span class="n">buf</span><span class="o">-&gt;</span><span class="n">mem</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">buf</span><span class="o">-&gt;</span><span class="n">offset</span><span class="p">,</span><span class="w"> </span><span class="n">fmt</span><span class="p">,</span><span class="w"> </span><span class="n">arglist</span><span class="p">);</span> <span class="w"> </span><span class="n">Assert</span><span class="p">(</span><span class="n">chars</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span><span class="w"> </span><span class="c1">// TODO: error handling.</span> <span class="w"> </span><span class="n">buf</span><span class="o">-&gt;</span><span class="n">offset</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">chars</span><span class="p">;</span> <span class="w"> </span><span class="n">va_end</span><span class="p">(</span><span class="n">arglist</span><span class="p">);</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="kt">char</span><span class="o">*</span><span class="w"> </span><span class="nf">buffer_cstring</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="n">zero</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="kt">size_t</span><span class="w"> </span><span class="n">prev_offset</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">buf</span><span class="o">-&gt;</span><span class="n">offset</span><span class="p">;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">buf</span><span class="o">-&gt;</span><span class="n">offset</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">buf</span><span class="o">-&gt;</span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">buffer_append</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="n">zero</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span> <span class="w"> </span><span class="n">buf</span><span class="o">-&gt;</span><span class="n">offset</span><span class="o">--</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">buf</span><span class="o">-&gt;</span><span class="n">mem</span><span class="p">[</span><span class="n">buf</span><span class="o">-&gt;</span><span class="n">offset</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Offset should stay the same. This is a fake NULL.</span> <span class="w"> </span><span class="n">Assert</span><span class="p">(</span><span class="n">buf</span><span class="o">-&gt;</span><span class="n">offset</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">prev_offset</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">buf</span><span class="o">-&gt;</span><span class="n">mem</span><span class="p">;</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_free</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">free</span><span class="p">(</span><span class="n">buf</span><span class="o">-&gt;</span><span class="n">mem</span><span class="p">);</span> <span class="p">}</span> </pre></div> <p>Next we'll modify <code>print_plan()</code> in <code>pgexec.c</code> to use it, and add stubs for printing the <code>SELECT</code> columns and <code>WHERE</code> clauses.</p> <div class="highlight"><pre><span></span><span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_print_where</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">QueryDesc</span><span class="o">*</span><span class="w"> </span><span class="n">queryDesc</span><span class="p">,</span><span class="w"> </span><span class="n">Plan</span><span class="o">*</span><span class="w"> </span><span class="n">plan</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">buffer_appendz</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">&quot; [where todo]&quot;</span><span class="p">);</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_print_select_columns</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">QueryDesc</span><span class="o">*</span><span class="w"> </span><span class="n">queryDesc</span><span class="p">,</span><span class="w"> </span><span class="n">Plan</span><span class="o">*</span><span class="w"> </span><span class="n">plan</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">buffer_appendz</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;[columns todo]&quot;</span><span class="p">);</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">print_plan</span><span class="p">(</span><span class="n">QueryDesc</span><span class="o">*</span><span class="w"> </span><span class="n">queryDesc</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">SeqScan</span><span class="o">*</span><span class="w"> </span><span class="n">scan</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span> <span class="w"> </span><span class="n">RangeTblEntry</span><span class="o">*</span><span class="w"> </span><span class="n">rte</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span> <span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">relation</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{};</span> <span class="w"> </span><span class="kt">char</span><span class="o">*</span><span class="w"> </span><span class="n">tablename</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span> <span class="w"> </span><span class="n">Plan</span><span class="o">*</span><span class="w"> </span><span class="n">plan</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">queryDesc</span><span class="o">-&gt;</span><span class="n">plannedstmt</span><span class="o">-&gt;</span><span class="n">planTree</span><span class="p">;</span> <span class="w"> </span><span class="n">PGExec_Buffer</span><span class="w"> </span><span class="n">buf</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{};</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">plan</span><span class="o">-&gt;</span><span class="n">type</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">T_SeqScan</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">elog</span><span class="p">(</span><span class="n">LOG</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;[pgexec] Unsupported plan type.&quot;</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">scan</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">SeqScan</span><span class="o">*</span><span class="p">)</span><span class="n">plan</span><span class="p">;</span> <span class="w"> </span><span class="n">rte</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">list_nth</span><span class="p">(</span><span class="n">queryDesc</span><span class="o">-&gt;</span><span class="n">plannedstmt</span><span class="o">-&gt;</span><span class="n">rtable</span><span class="p">,</span><span class="w"> </span><span class="n">scan</span><span class="o">-&gt;</span><span class="n">scan</span><span class="p">.</span><span class="n">scanrelid</span><span class="mi">-1</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">rte</span><span class="o">-&gt;</span><span class="n">rtekind</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">RTE_RELATION</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">elog</span><span class="p">(</span><span class="n">LOG</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;[pgexec] Unsupported FROM type: %d.&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">rte</span><span class="o">-&gt;</span><span class="n">rtekind</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">buffer_init</span><span class="p">(</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">);</span> <span class="w"> </span><span class="n">relation</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">RelationIdGetRelation</span><span class="p">(</span><span class="n">rte</span><span class="o">-&gt;</span><span class="n">relid</span><span class="p">);</span> <span class="w"> </span><span class="n">tablename</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">NameStr</span><span class="p">(</span><span class="n">relation</span><span class="o">-&gt;</span><span class="n">rd_rel</span><span class="o">-&gt;</span><span class="n">relname</span><span class="p">);</span> <span class="w"> </span><span class="n">buffer_appendz</span><span class="p">(</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;SELECT &quot;</span><span class="p">);</span> <span class="w"> </span><span class="n">buffer_print_select_columns</span><span class="p">(</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">queryDesc</span><span class="p">,</span><span class="w"> </span><span class="n">plan</span><span class="p">);</span> <span class="w"> </span><span class="n">buffer_appendf</span><span class="p">(</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">&quot; FROM %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">tablename</span><span class="p">);</span> <span class="w"> </span><span class="n">buffer_print_where</span><span class="p">(</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">queryDesc</span><span class="p">,</span><span class="w"> </span><span class="n">plan</span><span class="p">);</span> <span class="w"> </span><span class="n">elog</span><span class="p">(</span><span class="n">LOG</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;[pgexec] %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">buffer_cstring</span><span class="p">(</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">));</span> <span class="w"> </span><span class="n">RelationClose</span><span class="p">(</span><span class="n">relation</span><span class="p">);</span> <span class="w"> </span><span class="n">buffer_free</span><span class="p">(</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">);</span> <span class="p">}</span> </pre></div> <p>Now we just need to implement the <code>buffer_print_where</code> and <code>buffer_print_select_columns</code> functions and our walking infrastructure will be done! For now. :)</p> <h3 id="walking-the-<code>where</code>-clause">Walking the <code>WHERE</code> clause</h3><p>If you remember back to the <code>SeqScan</code> and <code>Scan</code> nodes, they were both basically empty. They had a <code>Plan</code> and a <code>scanrelid</code>. So the rest of the <code>SELECT</code> info must be in the <code>Plan</code> since it wasn't in the <code>Scan</code>.</p> <p>Let's look at <a href="https://github.com/postgres/postgres/blob/master/src/include/nodes/plannodes.h#L119"><code>Plan</code></a> again. One part that stands out is:</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="cm">/*</span> <span class="cm"> * Common structural data for all Plan types.</span> <span class="cm"> */</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">plan_node_id</span><span class="p">;</span><span class="w"> </span><span class="cm">/* unique across entire final plan tree */</span> <span class="w"> </span><span class="n">List</span><span class="w"> </span><span class="o">*</span><span class="n">targetlist</span><span class="p">;</span><span class="w"> </span><span class="cm">/* target list to be computed at this node */</span> <span class="w"> </span><span class="n">List</span><span class="w"> </span><span class="o">*</span><span class="n">qual</span><span class="p">;</span><span class="w"> </span><span class="cm">/* implicitly-ANDed qual conditions */</span> <span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">Plan</span><span class="w"> </span><span class="o">*</span><span class="n">lefttree</span><span class="p">;</span><span class="w"> </span><span class="cm">/* input plan tree(s) */</span> <span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">Plan</span><span class="w"> </span><span class="o">*</span><span class="n">righttree</span><span class="p">;</span> <span class="w"> </span><span class="n">List</span><span class="w"> </span><span class="o">*</span><span class="n">initPlan</span><span class="p">;</span><span class="w"> </span><span class="cm">/* Init Plan nodes (un-correlated expr</span> <span class="cm"> * subselects) */</span> </pre></div> <p><code>qual</code> kinda looks like a <code>WHERE</code> clause. (And <code>targetlist</code> kinda looks like the columns the <code>SELECT</code> pulls).</p> <p><a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/include/nodes/pg_list.h#L53"><code>List</code></a>s just contain void pointers, so we can't tell what the type of <code>qual</code> or <code>targetlist</code> children are. But I'm going to make a wild guess they are <code>Node</code>s.</p> <p>There's even a nice helper that casts void pointers to <code>Node*</code> and pulls out the type, <a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/include/nodes/nodes.h#L133"><code>nodeTag()</code></a>.</p> <p>And reading around <code>pg_list.h</code> shows some interesting helper utilities like <a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/include/nodes/pg_list.h#L373"><code>foreach</code></a> that we can use to iterate the list.</p> <p>Let's try printing out the type of <code>qual</code>'s members.</p> <div class="highlight"><pre><span></span><span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_print_where</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">QueryDesc</span><span class="o">*</span><span class="w"> </span><span class="n">queryDesc</span><span class="p">,</span><span class="w"> </span><span class="n">Plan</span><span class="o">*</span><span class="w"> </span><span class="n">plan</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">ListCell</span><span class="o">*</span><span class="w"> </span><span class="n">cell</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span> <span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">first</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">true</span><span class="p">;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">plan</span><span class="o">-&gt;</span><span class="n">qual</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">NULL</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">buffer_appendz</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">&quot; WHERE &quot;</span><span class="p">);</span> <span class="w"> </span><span class="n">foreach</span><span class="p">(</span><span class="n">cell</span><span class="p">,</span><span class="w"> </span><span class="n">plan</span><span class="o">-&gt;</span><span class="n">qual</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">first</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">buffer_appendz</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">&quot; AND &quot;</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">first</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">false</span><span class="p">;</span> <span class="w"> </span><span class="n">buffer_appendf</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;[node: %d]&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">nodeTag</span><span class="p">(</span><span class="n">lfirst</span><span class="p">(</span><span class="n">cell</span><span class="p">)));</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p class="note"> Notice any <a href="https://twitter.com/eatonphil/status/1726265982094819631">vestiges of LISP</a>? </p><p>Rebuild and reinstall the extension, and restart Postgres. Re-run the <code>test.sql</code> script. Check the Postgres server logs and you should see:</p> <div class="highlight"><pre><span></span><span class="go">2023-11-19 19:17:00.879 GMT [3250850] LOG: [pgexec] SELECT [columns todo] FROM x WHERE [node: 15]</span> <span class="go">2023-11-19 19:17:00.879 GMT [3250850] STATEMENT: SELECT a FROM x WHERE a &gt; 1;</span> </pre></div> <p>Well, our code didn't crash! So the guess about <code>qual</code> <code>List</code> entries being <code>Node</code>s seems right. Let's look up that node type in the Postgres repo:</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>grep<span class="w"> </span><span class="s1">&#39; = 15,&#39;</span><span class="w"> </span>src/include/nodes/nodetags.h <span class="go"> T_OpExpr = 15,</span> </pre></div> <p>Woot! That is exactly what I'd expect the <code>WHERE</code> clause here to be.</p> <p>Now that we know <code>qual</code> is a <code>List</code> of <code>Node</code>s, let's do a bit of refactoring since <code>targetlist</code> will probably also be a <code>List</code> of <code>Node</code>s. Back in <code>pgexec.c</code>:</p> <div class="highlight"><pre><span></span><span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_print_expr</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="n">Node</span><span class="o">*</span><span class="p">);</span> <span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_print_list</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="n">List</span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="o">*</span><span class="p">);</span> <span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_print_opexpr</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">OpExpr</span><span class="o">*</span><span class="w"> </span><span class="n">op</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">buffer_appendf</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;[opexpr: todo]&quot;</span><span class="p">);</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_print_expr</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">Node</span><span class="o">*</span><span class="w"> </span><span class="n">expr</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">nodeTag</span><span class="p">(</span><span class="n">expr</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">T_OpExpr</span><span class="p">:</span> <span class="w"> </span><span class="n">buffer_print_opexpr</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="n">OpExpr</span><span class="o">*</span><span class="p">)</span><span class="n">expr</span><span class="p">);</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="k">default</span><span class="o">:</span> <span class="w"> </span><span class="n">buffer_appendf</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;[Unknown: %d]&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">nodeTag</span><span class="p">(</span><span class="n">expr</span><span class="p">));</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_print_list</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">List</span><span class="o">*</span><span class="w"> </span><span class="n">list</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="o">*</span><span class="w"> </span><span class="n">sep</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">ListCell</span><span class="o">*</span><span class="w"> </span><span class="n">cell</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span> <span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">first</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">true</span><span class="p">;</span> <span class="w"> </span><span class="n">foreach</span><span class="p">(</span><span class="n">cell</span><span class="p">,</span><span class="w"> </span><span class="n">list</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">first</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">buffer_appendz</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">sep</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">first</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">false</span><span class="p">;</span> <span class="w"> </span><span class="n">buffer_print_expr</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="n">Node</span><span class="o">*</span><span class="p">)</span><span class="n">lfirst</span><span class="p">(</span><span class="n">cell</span><span class="p">));</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_print_where</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">QueryDesc</span><span class="o">*</span><span class="w"> </span><span class="n">queryDesc</span><span class="p">,</span><span class="w"> </span><span class="n">Plan</span><span class="o">*</span><span class="w"> </span><span class="n">plan</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">plan</span><span class="o">-&gt;</span><span class="n">qual</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">NULL</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">buffer_appendz</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">&quot; WHERE &quot;</span><span class="p">);</span> <span class="w"> </span><span class="n">buffer_print_list</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">plan</span><span class="o">-&gt;</span><span class="n">qual</span><span class="p">,</span><span class="w"> </span><span class="s">&quot; AND &quot;</span><span class="p">);</span> <span class="p">}</span> </pre></div> <p>And let's check out <code>OpExpr</code>!</p> <h3 id="walking-<code>opexpr</code>">Walking <code>OpExpr</code></h3><p>Take a look at the definition of <a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/include/nodes/primnodes.h#L748"><code>OpExpr</code></a>:</p> <div class="highlight"><pre><span></span><span class="k">typedef</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">OpExpr</span> <span class="p">{</span> <span class="w"> </span><span class="n">Expr</span><span class="w"> </span><span class="n">xpr</span><span class="p">;</span> <span class="w"> </span><span class="cm">/* PG_OPERATOR OID of the operator */</span> <span class="w"> </span><span class="n">Oid</span><span class="w"> </span><span class="n">opno</span><span class="p">;</span> <span class="w"> </span><span class="cm">/* PG_PROC OID of underlying function */</span> <span class="w"> </span><span class="n">Oid</span><span class="w"> </span><span class="n">opfuncid</span><span class="w"> </span><span class="n">pg_node_attr</span><span class="p">(</span><span class="n">equal_ignore_if_zero</span><span class="p">,</span><span class="w"> </span><span class="n">query_jumble_ignore</span><span class="p">);</span> <span class="w"> </span><span class="cm">/* PG_TYPE OID of result value */</span> <span class="w"> </span><span class="n">Oid</span><span class="w"> </span><span class="n">opresulttype</span><span class="w"> </span><span class="n">pg_node_attr</span><span class="p">(</span><span class="n">query_jumble_ignore</span><span class="p">);</span> <span class="w"> </span><span class="cm">/* true if operator returns set */</span> <span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">opretset</span><span class="w"> </span><span class="nf">pg_node_attr</span><span class="p">(</span><span class="n">query_jumble_ignore</span><span class="p">);</span> <span class="w"> </span><span class="cm">/* OID of collation of result */</span> <span class="w"> </span><span class="n">Oid</span><span class="w"> </span><span class="n">opcollid</span><span class="w"> </span><span class="n">pg_node_attr</span><span class="p">(</span><span class="n">query_jumble_ignore</span><span class="p">);</span> <span class="w"> </span><span class="cm">/* OID of collation that operator should use */</span> <span class="w"> </span><span class="n">Oid</span><span class="w"> </span><span class="n">inputcollid</span><span class="w"> </span><span class="n">pg_node_attr</span><span class="p">(</span><span class="n">query_jumble_ignore</span><span class="p">);</span> <span class="w"> </span><span class="cm">/* arguments to the operator (1 or 2) */</span> <span class="w"> </span><span class="n">List</span><span class="w"> </span><span class="o">*</span><span class="n">args</span><span class="p">;</span> <span class="w"> </span><span class="cm">/* token location, or -1 if unknown */</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">location</span><span class="p">;</span> <span class="p">}</span><span class="w"> </span><span class="n">OpExpr</span><span class="p">;</span> </pre></div> <p>The important fields are <code>opno</code>, the <code>Oid</code> of the operator, and <code>args</code>. <code>args</code> looks like another <code>List</code> of <code>Node</code>s so we already know how to handle that.</p> <p>But how do we find the string name of the operator? Presumably there's infrastructure like <code>RelationIdGetRelation</code> that takes an <code>Oid</code> and gets us an operator object.</p> <p>Well I got stuck here as well. Again, thankfully, ChatGPT gave me some suggestions. There's no great story for how I got it working. So here's <code>buffer_print_opexpr</code>.</p> <div class="highlight"><pre><span></span><span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_print_op</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">OpExpr</span><span class="o">*</span><span class="w"> </span><span class="n">op</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">HeapTuple</span><span class="w"> </span><span class="n">opertup</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">SearchSysCache1</span><span class="p">(</span><span class="n">OPEROID</span><span class="p">,</span><span class="w"> </span><span class="n">ObjectIdGetDatum</span><span class="p">(</span><span class="n">op</span><span class="o">-&gt;</span><span class="n">opno</span><span class="p">));</span> <span class="w"> </span><span class="n">buffer_print_expr</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">lfirst</span><span class="p">(</span><span class="n">list_nth_cell</span><span class="p">(</span><span class="n">op</span><span class="o">-&gt;</span><span class="n">args</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">)));</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">HeapTupleIsValid</span><span class="p">(</span><span class="n">opertup</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">Form_pg_operator</span><span class="w"> </span><span class="n">operator</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">Form_pg_operator</span><span class="p">)</span><span class="n">GETSTRUCT</span><span class="p">(</span><span class="n">opertup</span><span class="p">);</span> <span class="w"> </span><span class="n">buffer_appendf</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">&quot; %s &quot;</span><span class="p">,</span><span class="w"> </span><span class="n">NameStr</span><span class="p">(</span><span class="n">operator</span><span class="o">-&gt;</span><span class="n">oprname</span><span class="p">));</span> <span class="w"> </span><span class="n">ReleaseSysCache</span><span class="p">(</span><span class="n">opertup</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">buffer_appendf</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;[Unknown operation: %d]&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">op</span><span class="o">-&gt;</span><span class="n">opno</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// TODO: Support single operand operations.</span> <span class="w"> </span><span class="n">buffer_print_expr</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">lfirst</span><span class="p">(</span><span class="n">list_nth_cell</span><span class="p">(</span><span class="n">op</span><span class="o">-&gt;</span><span class="n">args</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">)));</span> <span class="p">}</span> </pre></div> <p>And add the following two includes to the top of <code>pgexec.c</code>:</p> <div class="highlight"><pre><span></span><span class="cp">#include</span><span class="w"> </span><span class="cpf">&quot;catalog/pg_operator.h&quot;</span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&quot;utils/syscache.h&quot;</span> </pre></div> <p>Rebuild and reinstall the extension, and restart Postgres. Re-run the <code>test.sql</code> script. Check the Postgres server logs and you should see:</p> <div class="highlight"><pre><span></span><span class="go">2023-11-19 19:42:52.916 GMT [3252974] LOG: [pgexec] SELECT [columns todo] FROM x WHERE [Unknown: 6] &gt; [Unknown: 7]</span> <span class="go">2023-11-19 19:42:52.916 GMT [3252974] STATEMENT: SELECT a FROM x WHERE a &gt; 1;</span> </pre></div> <p>And we continue to make progress! Let's look up the type of these two unknown nodes.</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>grep<span class="w"> </span><span class="s1">&#39; = 6,&#39;</span><span class="w"> </span>src/include/nodes/nodetags.h <span class="go"> T_Var = 6,</span> <span class="gp">$ </span>grep<span class="w"> </span><span class="s1">&#39; = 7,&#39;</span><span class="w"> </span>src/include/nodes/nodetags.h <span class="go"> T_Const = 7,</span> </pre></div> <p>Let's deal with <code>Const</code> first.</p> <h3 id="walking-<code>const</code>">Walking <code>Const</code></h3><p>If we take a look at the <a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/include/nodes/primnodes.h#L292"><code>Const</code></a> definition:</p> <div class="highlight"><pre><span></span><span class="k">typedef</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">Const</span> <span class="p">{</span> <span class="w"> </span><span class="n">pg_node_attr</span><span class="p">(</span><span class="n">custom_copy_equal</span><span class="p">,</span><span class="w"> </span><span class="n">custom_read_write</span><span class="p">)</span> <span class="w"> </span><span class="n">Expr</span><span class="w"> </span><span class="n">xpr</span><span class="p">;</span> <span class="w"> </span><span class="cm">/* pg_type OID of the constant&#39;s datatype */</span> <span class="w"> </span><span class="n">Oid</span><span class="w"> </span><span class="n">consttype</span><span class="p">;</span> <span class="w"> </span><span class="cm">/* typmod value, if any */</span> <span class="w"> </span><span class="n">int32</span><span class="w"> </span><span class="n">consttypmod</span><span class="w"> </span><span class="n">pg_node_attr</span><span class="p">(</span><span class="n">query_jumble_ignore</span><span class="p">);</span> <span class="w"> </span><span class="cm">/* OID of collation, or InvalidOid if none */</span> <span class="w"> </span><span class="n">Oid</span><span class="w"> </span><span class="n">constcollid</span><span class="w"> </span><span class="n">pg_node_attr</span><span class="p">(</span><span class="n">query_jumble_ignore</span><span class="p">);</span> <span class="w"> </span><span class="cm">/* typlen of the constant&#39;s datatype */</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">constlen</span><span class="w"> </span><span class="nf">pg_node_attr</span><span class="p">(</span><span class="n">query_jumble_ignore</span><span class="p">);</span> <span class="w"> </span><span class="cm">/* the constant&#39;s value */</span> <span class="w"> </span><span class="n">Datum</span><span class="w"> </span><span class="n">constvalue</span><span class="w"> </span><span class="n">pg_node_attr</span><span class="p">(</span><span class="n">query_jumble_ignore</span><span class="p">);</span> <span class="w"> </span><span class="cm">/* whether the constant is null (if true, constvalue is undefined) */</span> <span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">constisnull</span><span class="w"> </span><span class="nf">pg_node_attr</span><span class="p">(</span><span class="n">query_jumble_ignore</span><span class="p">);</span> <span class="w"> </span><span class="cm">/*</span> <span class="cm"> * Whether this datatype is passed by value. If true, then all the</span> <span class="cm"> * information is stored in the Datum. If false, then the Datum contains</span> <span class="cm"> * a pointer to the information.</span> <span class="cm"> */</span> <span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">constbyval</span><span class="w"> </span><span class="nf">pg_node_attr</span><span class="p">(</span><span class="n">query_jumble_ignore</span><span class="p">);</span> <span class="w"> </span><span class="cm">/*</span> <span class="cm"> * token location, or -1 if unknown. All constants are tracked as</span> <span class="cm"> * locations in query jumbling, to be marked as parameters.</span> <span class="cm"> */</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">location</span><span class="w"> </span><span class="nf">pg_node_attr</span><span class="p">(</span><span class="n">query_jumble_location</span><span class="p">);</span> <span class="p">}</span><span class="w"> </span><span class="n">Const</span><span class="p">;</span> </pre></div> <p>It looks like we need to switch on the <code>consttype</code> (an <code>Oid</code>) to figure out how to interpret the <code>constvalue</code> (a <code>Datum</code>). Remember I mentioned earlier that how to interpret a <code>Datum</code> is dependent on context. <code>consttype</code> is the context here.</p> <p>In this case, although <code>consttype</code> is an <code>Oid</code> and we had to use Postgres infrastructure to look up the <code>Oid</code>'s corresponding object, there are some builtin types and the literals we've queried with are among them.</p> <p>We can simply check if <code>consttype == INT4OID</code> and the interpret the <code>Datum</code> as an <code>int32</code> if so. <code>DatumGetInt32</code> will get us that <code>int32</code> in that case.</p> <p>To support the new <code>Const</code> type, we'll add a case in <code>buffer_print_expr</code> to look for a <code>T_Const</code>.</p> <div class="highlight"><pre><span></span><span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_print_expr</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">Node</span><span class="o">*</span><span class="w"> </span><span class="n">expr</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">nodeTag</span><span class="p">(</span><span class="n">expr</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">T_Const</span><span class="p">:</span> <span class="w"> </span><span class="n">buffer_print_const</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="n">Const</span><span class="o">*</span><span class="p">)</span><span class="n">expr</span><span class="p">);</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">T_OpExpr</span><span class="p">:</span> <span class="w"> </span><span class="n">buffer_print_opexpr</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="n">OpExpr</span><span class="o">*</span><span class="p">)</span><span class="n">expr</span><span class="p">);</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="k">default</span><span class="o">:</span> <span class="w"> </span><span class="n">buffer_appendf</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;[Unknown: %d]&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">nodeTag</span><span class="p">(</span><span class="n">expr</span><span class="p">));</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>And add a new function, <code>buffer_print_const</code>:</p> <div class="highlight"><pre><span></span><span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_print_const</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">Const</span><span class="o">*</span><span class="w"> </span><span class="n">cnst</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">cnst</span><span class="o">-&gt;</span><span class="n">consttype</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">INT4OID</span><span class="p">:</span> <span class="w"> </span><span class="n">int32</span><span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">DatumGetInt32</span><span class="p">(</span><span class="n">cnst</span><span class="o">-&gt;</span><span class="n">constvalue</span><span class="p">);</span> <span class="w"> </span><span class="n">buffer_appendf</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;%d&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">val</span><span class="p">);</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="k">default</span><span class="o">:</span> <span class="w"> </span><span class="n">buffer_appendf</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;[Unknown consttype oid: %d]&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">cnst</span><span class="o">-&gt;</span><span class="n">consttype</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>Rebuild and reinstall the extension, and restart Postgres. Re-run the <code>test.sql</code> script. Check the Postgres server logs and you should see:</p> <div class="highlight"><pre><span></span><span class="go">2023-11-19 19:53:47.922 GMT [3253746] LOG: [pgexec] SELECT [columns todo] FROM x WHERE [Unknown: 6] &gt; 1</span> <span class="go">2023-11-19 19:53:47.922 GMT [3253746] STATEMENT: SELECT a FROM x WHERE a &gt; 1;</span> </pre></div> <p>Great! Now we just have to tackle <code>T_Var</code>.</p> <h3 id="walking-<code>var</code>">Walking <code>Var</code></h3><p>Let's take a look at the definition of <a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/include/nodes/primnodes.h#L233"><code>Var</code></a>:</p> <div class="highlight"><pre><span></span><span class="k">typedef</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">Var</span> <span class="p">{</span> <span class="w"> </span><span class="n">Expr</span><span class="w"> </span><span class="n">xpr</span><span class="p">;</span> <span class="w"> </span><span class="cm">/*</span> <span class="cm"> * index of this var&#39;s relation in the range table, or</span> <span class="cm"> * INNER_VAR/OUTER_VAR/etc</span> <span class="cm"> */</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">varno</span><span class="p">;</span> <span class="w"> </span><span class="cm">/*</span> <span class="cm"> * attribute number of this var, or zero for all attrs (&quot;whole-row Var&quot;)</span> <span class="cm"> */</span> <span class="w"> </span><span class="n">AttrNumber</span><span class="w"> </span><span class="n">varattno</span><span class="p">;</span> <span class="w"> </span><span class="cm">/* pg_type OID for the type of this var */</span> <span class="w"> </span><span class="n">Oid</span><span class="w"> </span><span class="n">vartype</span><span class="w"> </span><span class="n">pg_node_attr</span><span class="p">(</span><span class="n">query_jumble_ignore</span><span class="p">);</span> <span class="w"> </span><span class="cm">/* pg_attribute typmod value */</span> <span class="w"> </span><span class="n">int32</span><span class="w"> </span><span class="n">vartypmod</span><span class="w"> </span><span class="n">pg_node_attr</span><span class="p">(</span><span class="n">query_jumble_ignore</span><span class="p">);</span> <span class="w"> </span><span class="cm">/* OID of collation, or InvalidOid if none */</span> <span class="w"> </span><span class="n">Oid</span><span class="w"> </span><span class="n">varcollid</span><span class="w"> </span><span class="n">pg_node_attr</span><span class="p">(</span><span class="n">query_jumble_ignore</span><span class="p">);</span> <span class="w"> </span><span class="cm">/*</span> <span class="cm"> * RT indexes of outer joins that can replace the Var&#39;s value with null.</span> <span class="cm"> * We can omit varnullingrels in the query jumble, because it&#39;s fully</span> <span class="cm"> * determined by varno/varlevelsup plus the Var&#39;s query location.</span> <span class="cm"> */</span> <span class="w"> </span><span class="n">Bitmapset</span><span class="w"> </span><span class="o">*</span><span class="n">varnullingrels</span><span class="w"> </span><span class="n">pg_node_attr</span><span class="p">(</span><span class="n">query_jumble_ignore</span><span class="p">);</span> <span class="w"> </span><span class="cm">/*</span> <span class="cm"> * for subquery variables referencing outer relations; 0 in a normal var,</span> <span class="cm"> * &gt;0 means N levels up</span> <span class="cm"> */</span> <span class="w"> </span><span class="n">Index</span><span class="w"> </span><span class="n">varlevelsup</span><span class="p">;</span> <span class="w"> </span><span class="cm">/*</span> <span class="cm"> * varnosyn/varattnosyn are ignored for equality, because Vars with</span> <span class="cm"> * different syntactic identifiers are semantically the same as long as</span> <span class="cm"> * their varno/varattno match.</span> <span class="cm"> */</span> <span class="w"> </span><span class="cm">/* syntactic relation index (0 if unknown) */</span> <span class="w"> </span><span class="n">Index</span><span class="w"> </span><span class="n">varnosyn</span><span class="w"> </span><span class="n">pg_node_attr</span><span class="p">(</span><span class="n">equal_ignore</span><span class="p">,</span><span class="w"> </span><span class="n">query_jumble_ignore</span><span class="p">);</span> <span class="w"> </span><span class="cm">/* syntactic attribute number */</span> <span class="w"> </span><span class="n">AttrNumber</span><span class="w"> </span><span class="n">varattnosyn</span><span class="w"> </span><span class="n">pg_node_attr</span><span class="p">(</span><span class="n">equal_ignore</span><span class="p">,</span><span class="w"> </span><span class="n">query_jumble_ignore</span><span class="p">);</span> <span class="w"> </span><span class="cm">/* token location, or -1 if unknown */</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">location</span><span class="p">;</span> <span class="p">}</span><span class="w"> </span><span class="n">Var</span><span class="p">;</span> </pre></div> <p>It looks like this refers to a relation in the range table list again. So this means we need to have access to the full <code>PlannedStmt</code> so we can read its <code>rtable</code> field again to find the table. Then we need to look up the <code>Relation</code> for the table and then we can use the <code>Var</code>'s <code>varattno</code> field to pick the nth attribute from the relation and get its string representation.</p> <p>However, ChatGPT found a slightly higher-level function: <a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/backend/utils/cache/lsyscache.c#L826"><code>get_attname()</code></a> that takes a relation <code>Oid</code> and an attribute index and returns the string name of the column.</p> <p>So here's what <code>buffer_print_var</code> looks like:</p> <div class="highlight"><pre><span></span><span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_print_var</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">PlannedStmt</span><span class="o">*</span><span class="w"> </span><span class="n">stmt</span><span class="p">,</span><span class="w"> </span><span class="n">Var</span><span class="o">*</span><span class="w"> </span><span class="n">var</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kt">char</span><span class="o">*</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span> <span class="w"> </span><span class="n">RangeTblEntry</span><span class="o">*</span><span class="w"> </span><span class="n">rte</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">list_nth</span><span class="p">(</span><span class="n">stmt</span><span class="o">-&gt;</span><span class="n">rtable</span><span class="p">,</span><span class="w"> </span><span class="n">var</span><span class="o">-&gt;</span><span class="n">varno</span><span class="mi">-1</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">rte</span><span class="o">-&gt;</span><span class="n">rtekind</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">RTE_RELATION</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">elog</span><span class="p">(</span><span class="n">LOG</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;[Unsupported relation type for var: %d].&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">rte</span><span class="o">-&gt;</span><span class="n">rtekind</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">get_attname</span><span class="p">(</span><span class="n">rte</span><span class="o">-&gt;</span><span class="n">relid</span><span class="p">,</span><span class="w"> </span><span class="n">var</span><span class="o">-&gt;</span><span class="n">varattno</span><span class="p">,</span><span class="w"> </span><span class="nb">false</span><span class="p">);</span> <span class="w"> </span><span class="n">buffer_appendz</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="p">);</span> <span class="w"> </span><span class="n">pfree</span><span class="p">(</span><span class="n">name</span><span class="p">);</span> <span class="p">}</span> </pre></div> <p>You'll also need to add another <code>#include</code> for <code>utils/lsyscache.h</code>.</p> <p>Let's add the <code>case T_Var:</code> check in <code>buffer_print_expr</code>, and also feed the <code>PlannedStmt*</code> through all the necessary <code>buffer_print_X</code> functions:</p> <div class="highlight"><pre><span></span><span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_print_expr</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="n">PlannedStmt</span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="n">Node</span><span class="o">*</span><span class="p">);</span> <span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_print_list</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="n">PlannedStmt</span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="n">List</span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="o">*</span><span class="p">);</span> <span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_print_opexpr</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">PlannedStmt</span><span class="o">*</span><span class="w"> </span><span class="n">stmt</span><span class="p">,</span><span class="w"> </span><span class="n">OpExpr</span><span class="o">*</span><span class="w"> </span><span class="n">op</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">HeapTuple</span><span class="w"> </span><span class="n">opertup</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">SearchSysCache1</span><span class="p">(</span><span class="n">OPEROID</span><span class="p">,</span><span class="w"> </span><span class="n">ObjectIdGetDatum</span><span class="p">(</span><span class="n">op</span><span class="o">-&gt;</span><span class="n">opno</span><span class="p">));</span> <span class="w"> </span><span class="n">buffer_print_expr</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">stmt</span><span class="p">,</span><span class="w"> </span><span class="n">lfirst</span><span class="p">(</span><span class="n">list_nth_cell</span><span class="p">(</span><span class="n">op</span><span class="o">-&gt;</span><span class="n">args</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">)));</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">HeapTupleIsValid</span><span class="p">(</span><span class="n">opertup</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">Form_pg_operator</span><span class="w"> </span><span class="n">operator</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">Form_pg_operator</span><span class="p">)</span><span class="n">GETSTRUCT</span><span class="p">(</span><span class="n">opertup</span><span class="p">);</span> <span class="w"> </span><span class="n">buffer_appendf</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">&quot; %s &quot;</span><span class="p">,</span><span class="w"> </span><span class="n">NameStr</span><span class="p">(</span><span class="n">operator</span><span class="o">-&gt;</span><span class="n">oprname</span><span class="p">));</span> <span class="w"> </span><span class="n">ReleaseSysCache</span><span class="p">(</span><span class="n">opertup</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">buffer_appendf</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;[Unknown operation: %d]&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">op</span><span class="o">-&gt;</span><span class="n">opno</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// TODO: Support single operand operations.</span> <span class="w"> </span><span class="n">buffer_print_expr</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">stmt</span><span class="p">,</span><span class="w"> </span><span class="n">lfirst</span><span class="p">(</span><span class="n">list_nth_cell</span><span class="p">(</span><span class="n">op</span><span class="o">-&gt;</span><span class="n">args</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">)));</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_print_const</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">Const</span><span class="o">*</span><span class="w"> </span><span class="n">cnst</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">cnst</span><span class="o">-&gt;</span><span class="n">consttype</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">INT4OID</span><span class="p">:</span> <span class="w"> </span><span class="n">int32</span><span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">DatumGetInt32</span><span class="p">(</span><span class="n">cnst</span><span class="o">-&gt;</span><span class="n">constvalue</span><span class="p">);</span> <span class="w"> </span><span class="n">buffer_appendf</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;%d&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">val</span><span class="p">);</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="k">default</span><span class="o">:</span> <span class="w"> </span><span class="n">buffer_appendf</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;[Unknown consttype oid: %d]&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">cnst</span><span class="o">-&gt;</span><span class="n">consttype</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_print_var</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">PlannedStmt</span><span class="o">*</span><span class="w"> </span><span class="n">stmt</span><span class="p">,</span><span class="w"> </span><span class="n">Var</span><span class="o">*</span><span class="w"> </span><span class="n">var</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kt">char</span><span class="o">*</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span> <span class="w"> </span><span class="n">RangeTblEntry</span><span class="o">*</span><span class="w"> </span><span class="n">rte</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">list_nth</span><span class="p">(</span><span class="n">stmt</span><span class="o">-&gt;</span><span class="n">rtable</span><span class="p">,</span><span class="w"> </span><span class="n">var</span><span class="o">-&gt;</span><span class="n">varno</span><span class="mi">-1</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">rte</span><span class="o">-&gt;</span><span class="n">rtekind</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">RTE_RELATION</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">elog</span><span class="p">(</span><span class="n">LOG</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;[Unsupported relation type for var: %d].&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">rte</span><span class="o">-&gt;</span><span class="n">rtekind</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">get_attname</span><span class="p">(</span><span class="n">rte</span><span class="o">-&gt;</span><span class="n">relid</span><span class="p">,</span><span class="w"> </span><span class="n">var</span><span class="o">-&gt;</span><span class="n">varattno</span><span class="p">,</span><span class="w"> </span><span class="nb">false</span><span class="p">);</span> <span class="w"> </span><span class="n">buffer_appendz</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="p">);</span> <span class="w"> </span><span class="n">pfree</span><span class="p">(</span><span class="n">name</span><span class="p">);</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_print_expr</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">PlannedStmt</span><span class="o">*</span><span class="w"> </span><span class="n">stmt</span><span class="p">,</span><span class="w"> </span><span class="n">Node</span><span class="o">*</span><span class="w"> </span><span class="n">expr</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">nodeTag</span><span class="p">(</span><span class="n">expr</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">T_Const</span><span class="p">:</span> <span class="w"> </span><span class="n">buffer_print_const</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="n">Const</span><span class="o">*</span><span class="p">)</span><span class="n">expr</span><span class="p">);</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">T_Var</span><span class="p">:</span> <span class="w"> </span><span class="n">buffer_print_var</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">stmt</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="n">Var</span><span class="o">*</span><span class="p">)</span><span class="n">expr</span><span class="p">);</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">T_OpExpr</span><span class="p">:</span> <span class="w"> </span><span class="n">buffer_print_opexpr</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">stmt</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="n">OpExpr</span><span class="o">*</span><span class="p">)</span><span class="n">expr</span><span class="p">);</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="k">default</span><span class="o">:</span> <span class="w"> </span><span class="n">buffer_appendf</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;[Unknown: %d]&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">nodeTag</span><span class="p">(</span><span class="n">expr</span><span class="p">));</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_print_list</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">PlannedStmt</span><span class="o">*</span><span class="w"> </span><span class="n">stmt</span><span class="p">,</span><span class="w"> </span><span class="n">List</span><span class="o">*</span><span class="w"> </span><span class="n">list</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="o">*</span><span class="w"> </span><span class="n">sep</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">ListCell</span><span class="o">*</span><span class="w"> </span><span class="n">cell</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span> <span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">first</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">true</span><span class="p">;</span> <span class="w"> </span><span class="n">foreach</span><span class="p">(</span><span class="n">cell</span><span class="p">,</span><span class="w"> </span><span class="n">list</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">first</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">buffer_appendz</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">sep</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">first</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">false</span><span class="p">;</span> <span class="w"> </span><span class="n">buffer_print_expr</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">stmt</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="n">Node</span><span class="o">*</span><span class="p">)</span><span class="n">lfirst</span><span class="p">(</span><span class="n">cell</span><span class="p">));</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_print_where</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">QueryDesc</span><span class="o">*</span><span class="w"> </span><span class="n">queryDesc</span><span class="p">,</span><span class="w"> </span><span class="n">Plan</span><span class="o">*</span><span class="w"> </span><span class="n">plan</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">plan</span><span class="o">-&gt;</span><span class="n">qual</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">NULL</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">buffer_appendz</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">&quot; WHERE &quot;</span><span class="p">);</span> <span class="w"> </span><span class="n">buffer_print_list</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">queryDesc</span><span class="o">-&gt;</span><span class="n">plannedstmt</span><span class="p">,</span><span class="w"> </span><span class="n">plan</span><span class="o">-&gt;</span><span class="n">qual</span><span class="p">,</span><span class="w"> </span><span class="s">&quot; AND &quot;</span><span class="p">);</span> <span class="p">}</span> </pre></div> <p>Rebuild and reinstall the extension, and restart Postgres. Re-run the <code>test.sql</code> script. Check the Postgres server logs and you should see:</p> <div class="highlight"><pre><span></span><span class="go">2023-11-19 20:03:14.351 GMT [3254458] LOG: [pgexec] SELECT [columns todo] FROM x WHERE a &gt; 1</span> <span class="go">2023-11-19 20:03:14.351 GMT [3254458] STATEMENT: SELECT a FROM x WHERE a &gt; 1;</span> </pre></div> <p>Huzzah!</p> <h3 id="walking-the-column-list">Walking the column list</h3><p>Let's get rid of <code>[columns todo]</code>. We already had the idea that <code>List* targetlist</code> on the <code>Plan</code> struct was a list of expression <code>Node</code>s. Let's try it.</p> <div class="highlight"><pre><span></span><span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_print_select_columns</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">QueryDesc</span><span class="o">*</span><span class="w"> </span><span class="n">queryDesc</span><span class="p">,</span><span class="w"> </span><span class="n">Plan</span><span class="o">*</span><span class="w"> </span><span class="n">plan</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">plan</span><span class="o">-&gt;</span><span class="n">targetlist</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">NULL</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">buffer_print_list</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">queryDesc</span><span class="o">-&gt;</span><span class="n">plannedstmt</span><span class="p">,</span><span class="w"> </span><span class="n">plan</span><span class="o">-&gt;</span><span class="n">targetlist</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;, &quot;</span><span class="p">);</span> <span class="p">}</span> </pre></div> <p>Rebuild and reinstall the extension, and restart Postgres. Re-run the <code>test.sql</code> script. Check the Postgres server logs and you should see:</p> <div class="highlight"><pre><span></span><span class="go">2023-11-19 20:12:48.091 GMT [3255398] LOG: [pgexec] SELECT [Unknown: 53] FROM x WHERE a &gt; 1</span> <span class="go">2023-11-19 20:12:48.091 GMT [3255398] STATEMENT: SELECT a FROM x WHERE a &gt; 1;</span> </pre></div> <p>Hmm. Let's look up <code>Node</code> <code>53</code> in Postgres:</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>grep<span class="w"> </span><span class="s1">&#39; = 53,&#39;</span><span class="w"> </span>src/include/nodes/nodetags.h <span class="go"> T_TargetEntry = 53,</span> </pre></div> <p>Based on the definition of <a href="https://github.com/postgres/postgres/blob/b218fbb7a35fcf31539bfad12732038fe082a2eb/src/include/nodes/primnodes.h#L1918"><code>TargetEntry</code></a>, it looks like we can ignore most of the fields (because we don't need to handle <code>SELECT a AS b</code> yet) and just proxy the child <code>expr</code> field.</p> <p>Let's add a <code>case T_TargetEntry</code> to <code>buffer_print_expr</code>:</p> <div class="highlight"><pre><span></span><span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">buffer_print_expr</span><span class="p">(</span><span class="n">PGExec_Buffer</span><span class="o">*</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">PlannedStmt</span><span class="o">*</span><span class="w"> </span><span class="n">stmt</span><span class="p">,</span><span class="w"> </span><span class="n">Node</span><span class="o">*</span><span class="w"> </span><span class="n">expr</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">nodeTag</span><span class="p">(</span><span class="n">expr</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">T_Const</span><span class="p">:</span> <span class="w"> </span><span class="n">buffer_print_const</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="n">Const</span><span class="o">*</span><span class="p">)</span><span class="n">expr</span><span class="p">);</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">T_Var</span><span class="p">:</span> <span class="w"> </span><span class="n">buffer_print_var</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">stmt</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="n">Var</span><span class="o">*</span><span class="p">)</span><span class="n">expr</span><span class="p">);</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">T_TargetEntry</span><span class="p">:</span> <span class="w"> </span><span class="n">buffer_print_expr</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">stmt</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="n">Node</span><span class="o">*</span><span class="p">)((</span><span class="n">TargetEntry</span><span class="o">*</span><span class="p">)</span><span class="n">expr</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">expr</span><span class="p">);</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">T_OpExpr</span><span class="p">:</span> <span class="w"> </span><span class="n">buffer_print_opexpr</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">stmt</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="n">OpExpr</span><span class="o">*</span><span class="p">)</span><span class="n">expr</span><span class="p">);</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="k">default</span><span class="o">:</span> <span class="w"> </span><span class="n">buffer_appendf</span><span class="p">(</span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;[Unknown: %d]&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">nodeTag</span><span class="p">(</span><span class="n">expr</span><span class="p">));</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>Rebuild and reinstall the extension, and restart Postgres. Re-run the <code>test.sql</code> script. Check the Postgres server logs and:</p> <div class="highlight"><pre><span></span><span class="go">2023-11-19 20:17:51.114 GMT [3257827] LOG: [pgexec] SELECT a FROM x WHERE a &gt; 1</span> <span class="go">2023-11-19 20:17:51.114 GMT [3257827] STATEMENT: SELECT a FROM x WHERE a &gt; 1;</span> </pre></div> <p>We did it!</p> <h3 id="variations">Variations</h3><p>Let's try out some other queries to make sure this wasn't just luck.</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>/usr/local/pgsql/bin/psql<span class="w"> </span>-h<span class="w"> </span>localhost<span class="w"> </span>postgres<span class="w"> </span>-c<span class="w"> </span><span class="s1">&#39;SELECT a + 1 FROM x&#39;</span> <span class="go"> ?column?</span> <span class="go">----------</span> <span class="go"> 310</span> <span class="gp gp-VirtualEnv">(1 row)</span> <span class="gp">$ </span>/usr/local/pgsql/bin/psql<span class="w"> </span>-h<span class="w"> </span>localhost<span class="w"> </span>postgres<span class="w"> </span>-c<span class="w"> </span><span class="s1">&#39;SELECT a + 1 FROM x WHERE 2 &gt; a&#39;</span> <span class="go"> ?column?</span> <span class="go">----------</span> <span class="gp gp-VirtualEnv">(0 rows)</span> </pre></div> <p>And back in the Postgres server logs:</p> <div class="highlight"><pre><span></span><span class="go">2023-11-19 20:19:28.057 GMT [3257874] LOG: [pgexec] SELECT a + 1 FROM x</span> <span class="go">2023-11-19 20:19:28.057 GMT [3257874] STATEMENT: SELECT a + 1 FROM x</span> <span class="go">2023-11-19 20:19:30.474 GMT [3257878] LOG: [pgexec] SELECT a + 1 FROM x WHERE 2 &gt; a</span> <span class="go">2023-11-19 20:19:30.474 GMT [3257878] STATEMENT: SELECT a + 1 FROM x WHERE 2 &gt; a</span> </pre></div> <p>Not bad!</p> <h3 id="next-steps">Next steps</h3><p>Printing out the statement here isn't incredibly useful. But it establishes a basis for future work that might avoid Postgres's query execution engine and do the execution ourselves, or to proxy execution to another system.</p> <h3 id="postscript:-on-chatgpt">Postscript: On ChatGPT</h3><p>My recent Postgres explorations would have been basically impossible if it weren't for being able to ask ChatGPT simple, stupid questions like "How do I get from a Postgres <code>Var</code> to a column name".</p> <p>It isn't always right. It doesn't always give great code. Actually, it normally gives pretty weird code. But it's been extremely useful for quick iteration when I get stuck.</p> <p>The only other place the information exists is in small blog posts around the internet, the Postgres mailing lists (that so far for me hasn't been super responsive), and the code itself.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I&#39;ve been on a Postgres roll. Let&#39;s dig into interpreting a Postgres query plan in preparation for future work on completely diverting the flow of Postgres query execution using execution hooks!<a href="https://t.co/EZrgoIiTuX">https://t.co/EZrgoIiTuX</a> <a href="https://t.co/7S6d6olPX8">pic.twitter.com/7S6d6olPX8</a></p>&mdash; Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1726336428626587710?ref_src=twsrc%5Etfw">November 19, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/2023-11-19-exploring-a-postgres-query-plan.htmlSun, 19 Nov 2023 00:00:00 +0000Writing a storage engine for Postgres: an in-memory Table Access Methodhttp://notes.eatonphil.com/2023-11-01-postgres-table-access-methods.html<p>With <a href="https://www.postgresql.org/docs/release/12.0/">Postgres 12</a>, released in 2019, it became possible to <a href="https://www.pgcon.org/2019/schedule/attachments/536_pgcon2019_pluggable_table_AM_V1.3.pdf">swap out Postgres's storage engine</a>.</p> <p>This is a feature MySQL has supported for a long time. There are at least <a href="https://github.com/eatonphil/pgtam">8 different</a> <em>built-in</em> engines you can pick from. <a href="https://myrocks.io/">MyRocks</a>, MySQL on RocksDB, is another popular third-party distribution.</p> <p>I assume there will be a renaissance of Postgres storage engines. To date, the efforts are nascent. <a href="https://github.com/orioledb/orioledb">OrioleDB</a> and <a href="https://github.com/citusdata/citus/blob/main/src/backend/columnar/README.md">Citus Columnar</a> are two promising third-party table access methods being actively developed.</p> <h3 id="why-alternative-storage-engines?">Why alternative storage engines?</h3><p>The ability to swap storage engines is useful because different workloads sometimes benefit from different storage approaches. Analytics workloads and columnar storage layouts <a href="https://docs.aws.amazon.com/redshift/latest/dg/c_columnar_storage_disk_mem_mgmnt.html">go well together</a>. Write-heavy workloads and LSM trees <a href="https://github.com/wiredtiger/wiredtiger/wiki/Btree-vs-LSM">go well together</a>. And some people like in-memory storage for running integration tests.</p> <p>By swapping out only the storage engine, you get the benefit of the rest of the Postgres or MySQL infrastructure. The query language, the wire protocol, the ecosystem, etc.</p> <h3 id="why-not-foreign-data-wrappers?">Why not foreign data wrappers?</h3><p>Very little has been written about the difference between foreign data wrappers (FDWs) and table access methods. Table access methods seems to be the lower-level layer where presumably you get better performance and cleaner integration. But there is clearly overlap between these two extension options.</p> <p>For example there is a <a href="https://github.com/ildus/clickhouse_fdw">FDW for ClickHouse</a> so when you create tables and rows and query the tables you are really creating and querying rows in a ClickHouse server. Similarly there's a <a href="https://github.com/vidardb/pgrocks-fdw">FDW for RocksDB</a>. And Citus's columnar engine works <a href="https://www.citusdata.com/blog/2021/03/06/citus-10-columnar-compression-for-postgres/#:~:text=What%20About%20cstore_fdw%3F">either</a> as a foreign data wrapper or a table access method.</p> <p>The Citus page draws the clearest distinction between FDWs and table access methods, but even that page is vague. Performance doesn't seem to be the main difference. Closer integration, and thus the ability to look more like vanilla Postgres from the outside, seems to be the gist.</p> <p>In any case, I wanted to explore the table access method API.</p> <h3 id="digging-in">Digging in</h3><p>I haven't written Postgres extensions before and I've never written C professionally. If you're familiar with Postgres internals or C and notice something funky, please <a href="mailto:[email protected]">let me know</a>!</p> <p>It turns out that almost no one has written how to implement the minimal table access methods for various storage engine operations. So after quite a bit of stumbling to get the basics of an in-memory storage engine working, I'm going to walk you through my approach.</p> <p>This is prototype-quality code which hopefully will be a useful base for further exploration.</p> <p>All code for this post is <a href="https://github.com/eatonphil/pgtam">available on GitHub</a>.</p> <h3 id="a-debug-postgres-build">A debug Postgres build</h3><p>First off, let's make a <a href="https://wiki.postgresql.org/wiki/Developer_FAQ#Compile-time">debug build</a> of Postgres.</p> <div class="highlight"><pre><span></span><span class="n">$</span><span class="w"> </span><span class="n">git</span><span class="w"> </span><span class="k">clone</span><span class="w"> </span><span class="n">https</span><span class="o">://</span><span class="n">github</span><span class="p">.</span><span class="n">com</span><span class="o">/</span><span class="n">postgres</span><span class="o">/</span><span class="n">postgres</span> <span class="n">$</span><span class="w"> </span><span class="c1"># An arbitrary commit from `master` after Postgres 16 I am on</span> <span class="n">$</span><span class="w"> </span><span class="n">git</span><span class="w"> </span><span class="n">checkout</span><span class="w"> </span><span class="n">849172ff4883d44168f96f39d3fde96d0aa34c99</span> <span class="n">$</span><span class="w"> </span><span class="n">cd</span><span class="w"> </span><span class="n">postgres</span> <span class="n">$</span><span class="w"> </span><span class="p">.</span><span class="o">/</span><span class="n">configure</span><span class="w"> </span><span class="o">--</span><span class="k">enable</span><span class="o">-</span><span class="n">cassert</span><span class="w"> </span><span class="o">--</span><span class="k">enable</span><span class="o">-</span><span class="n">debug</span><span class="w"> </span><span class="n">CFLAGS</span><span class="o">=</span><span class="s2">&quot;-ggdb -Og -g3 -fno-omit-frame-pointer&quot;</span> <span class="n">$</span><span class="w"> </span><span class="n">make</span><span class="w"> </span><span class="o">-</span><span class="n">j8</span> <span class="n">$</span><span class="w"> </span><span class="n">sudo</span><span class="w"> </span><span class="n">make</span><span class="w"> </span><span class="k">install</span> </pre></div> <p>This will install Postgres binaries (e.g. <code>psql</code>, <code>pg_ctl</code>, <code>initdb</code>, <code>pg_config</code>) into <code>/usr/local/pgsql/bin</code>.</p> <p>I'm going to reference those absolute paths throughout this post because you might have a system (package manager) install of Postgres already.</p> <p>Let's create a database and start up this debug build:</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>/usr/local/pgsql/bin/initdb<span class="w"> </span>test-db <span class="gp">$ </span>/usr/local/pgsql/bin/pg_ctl<span class="w"> </span>-D<span class="w"> </span>test-db<span class="w"> </span>-l<span class="w"> </span>logfile<span class="w"> </span>start </pre></div> <h3 id="extension-infrastructure">Extension infrastructure</h3><p>Since we installed Postgres from scratch, <code>/usr/local/pgsql/bin/pg_config</code> will supply all of the infrastructure we need.</p> <p>The "infrastructure" is basically just <a href="https://www.postgresql.org/docs/current/extend-pgxs.html">PGXS</a>: Postgres Makefile utilities.</p> <p>It's convention-heavy. So in a new <code>Makefile</code> for this project we'll specify:</p> <ol> <li><code>MODULES</code>: Any C sources to build, without the <code>.c</code> file extension</li> <li><code>EXTENSION</code>: Extension metadata file, without the <code>.control</code> file extension</li> <li><code>DATA</code>: A SQL file that is executed when the extension is loaded, this time with the <code>.sql</code> extension</li> </ol> <div class="highlight"><pre><span></span><span class="nv">MODULES</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>pgtam <span class="nv">EXTENSION</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>pgtam <span class="nv">DATA</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>pgtam--0.0.1.sql <span class="nv">PG_CONFIG</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>/usr/local/pgsql/bin/pg_config <span class="nv">PGXS</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">$(</span>shell<span class="w"> </span><span class="k">$(</span>PG_CONFIG<span class="k">)</span><span class="w"> </span>--pgxs<span class="k">)</span> <span class="cp">include $(PGXS)</span> </pre></div> <p>The final three lines set up the PGXS Makefile library based on the particular installed Postgres build we want to build the extension against and install the extension to.</p> <p>PGXS gives us a few important targets like <code>make distclean</code>, <code>make</code>, and <code>make install</code> we'll use later on.</p> <h4 id="<code>pgtam.c</code>"><code>pgtam.c</code></h4><p>A minimal C file that registers a function capable of serving as a table access method is:</p> <div class="highlight"><pre><span></span><span class="cp">#include</span><span class="w"> </span><span class="cpf">&quot;postgres.h&quot;</span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&quot;fmgr.h&quot;</span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&quot;access/tableam.h&quot;</span> <span class="n">PG_MODULE_MAGIC</span><span class="p">;</span> <span class="k">const</span><span class="w"> </span><span class="n">TableAmRoutine</span><span class="w"> </span><span class="n">memam_methods</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">T_TableAmRoutine</span><span class="p">,</span> <span class="p">};</span> <span class="n">PG_FUNCTION_INFO_V1</span><span class="p">(</span><span class="n">mem_tableam_handler</span><span class="p">);</span> <span class="n">Datum</span><span class="w"> </span><span class="nf">mem_tableam_handler</span><span class="p">(</span><span class="n">PG_FUNCTION_ARGS</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">PG_RETURN_POINTER</span><span class="p">(</span><span class="o">&amp;</span><span class="n">memam_methods</span><span class="p">);</span> <span class="p">}</span> </pre></div> <p class="note"> If you want to read about extension basics without the complexity of table access methods, you can find a complete, minimal Postgres extension I wrote to validate the infrastructure <a href="https://github.com/eatonphil/pgext-101">here</a>. Or you can follow a <a href="https://github.com/IshaanAdarsh/Postgres-extension-tutorial/blob/main/SGML/intro_and_toc.md">larger tutorial</a>. </p><p>The workflow for registering a table access method is to first run <code>CREATE EXTENSION pgtam</code>. This assumes <code>pgtam</code> is an extension that has a function that returns a <code>TableAmRoutine</code> struct instance, a table of table access methods.</p> <p>Then you must run <code>CREATE ACCESS METHOD mem TYPE TABLE HANDLER mem_tableam_handler</code>. And finally you can use the access method when creating a table with <code>USING mem</code>: <code>CREATE TABLE x(a INT) USING mem</code>.</p> <h4 id="<code>pgtam.control</code>"><code>pgtam.control</code></h4><p>This file contains extension metadata. At a minimum, the version of the extension and the filename for the extension where it should be installed.</p> <div class="highlight"><pre><span></span>default_version = &#39;0.0.1&#39; module_pathname = &#39;$libdir/pgtam&#39; </pre></div> <h4 id="<code>pgtam--0.0.1.sql</code>"><code>pgtam--0.0.1.sql</code></h4><p>Finally, in <code>pgtam--0.0.1.sql</code> (which is executed when we call <code>CREATE EXTENSION pgtam</code>), we register the handler function as a foreign function, and then we register the function as an access method.</p> <div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="k">REPLACE</span><span class="w"> </span><span class="k">FUNCTION</span><span class="w"> </span><span class="n">mem_tableam_handler</span><span class="p">(</span><span class="n">internal</span><span class="p">)</span> <span class="k">RETURNS</span><span class="w"> </span><span class="n">table_am_handler</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="s1">&#39;pgtam&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;mem_tableam_handler&#39;</span> <span class="k">LANGUAGE</span><span class="w"> </span><span class="k">C</span><span class="w"> </span><span class="k">STRICT</span><span class="p">;</span> <span class="k">CREATE</span><span class="w"> </span><span class="k">ACCESS</span><span class="w"> </span><span class="k">METHOD</span><span class="w"> </span><span class="n">mem</span><span class="w"> </span><span class="k">TYPE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="k">HANDLER</span><span class="w"> </span><span class="n">mem_tableam_handler</span><span class="p">;</span> </pre></div> <h4 id="build">Build</h4><p>Now that we've got all the pieces in place, we can build and install the extension.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>make $<span class="w"> </span>sudo<span class="w"> </span>make<span class="w"> </span>install </pre></div> <p>Let's add a <code>test.sql</code> script to exercise the extension:</p> <div class="highlight"><pre><span></span><span class="k">DROP</span><span class="w"> </span><span class="n">EXTENSION</span><span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="k">EXISTS</span><span class="w"> </span><span class="n">pgtam</span><span class="w"> </span><span class="k">CASCADE</span><span class="p">;</span> <span class="k">CREATE</span><span class="w"> </span><span class="n">EXTENSION</span><span class="w"> </span><span class="n">pgtam</span><span class="p">;</span> <span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">x</span><span class="p">(</span><span class="n">a</span><span class="w"> </span><span class="nb">INT</span><span class="p">)</span><span class="w"> </span><span class="k">USING</span><span class="w"> </span><span class="n">mem</span><span class="p">;</span> </pre></div> <p>And run it:</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>/usr/local/pgsql/bin/psql<span class="w"> </span>postgres<span class="w"> </span>-f<span class="w"> </span>test.sql <span class="go">DROP EXTENSION</span> <span class="go">CREATE EXTENSION</span> <span class="go">psql:test.sql:3: server closed the connection unexpectedly</span> <span class="go"> This probably means the server terminated abnormally</span> <span class="go"> before or while processing the request.</span> <span class="go">psql:test.sql:3: error: connection to server was lost</span> </pre></div> <p>Ok, so <code>psql</code> crashed! Let's look at the server logs. When we started Postgres with <code>pg_ctl</code> we specified the log file as <code>logfile</code> in the directory where we ran <code>pg_ctl</code>.</p> <p>If we look through it we'll spot an assertion failure:</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>grep<span class="w"> </span>Assert<span class="w"> </span>logfile <span class="go">TRAP: failed Assert(&quot;routine-&gt;scan_begin != NULL&quot;), File: &quot;tableamapi.c&quot;, Line: 52, PID: 2906922</span> </pre></div> <p>That's a great sign! This is Postgres's debug infrastructure helping to make sure the table access method is correctly implemented.</p> <h3 id="table-access-method-stubs">Table access method stubs</h3><p>The next step is to add function stubs for all the non-optional methods of the <a href="https://github.com/postgres/postgres/blob/849172ff4883d44168f96f39d3fde96d0aa34c99/src/include/access/tableam.h#L282"><code>TableAmRoutine</code> struct</a>.</p> <p>I've done all the work for you already so you can just copy this over the existing <code>pgtam.c</code>. It's a big file, but don't worry. There's nothing to explain. Just a bunch of blank functions returning default values when required.</p> <div class="highlight"><pre><span></span><span class="cp">#include</span><span class="w"> </span><span class="cpf">&quot;postgres.h&quot;</span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&quot;fmgr.h&quot;</span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&quot;access/tableam.h&quot;</span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&quot;access/heapam.h&quot;</span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&quot;nodes/execnodes.h&quot;</span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&quot;catalog/index.h&quot;</span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&quot;commands/vacuum.h&quot;</span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&quot;utils/builtins.h&quot;</span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&quot;executor/tuptable.h&quot;</span> <span class="n">PG_MODULE_MAGIC</span><span class="p">;</span> <span class="k">const</span><span class="w"> </span><span class="n">TableAmRoutine</span><span class="w"> </span><span class="n">memam_methods</span><span class="p">;</span> <span class="k">static</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">TupleTableSlotOps</span><span class="o">*</span><span class="w"> </span><span class="nf">memam_slot_callbacks</span><span class="p">(</span> <span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">relation</span> <span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="n">TableScanDesc</span><span class="w"> </span><span class="nf">memam_beginscan</span><span class="p">(</span> <span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">relation</span><span class="p">,</span> <span class="w"> </span><span class="n">Snapshot</span><span class="w"> </span><span class="n">snapshot</span><span class="p">,</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">nkeys</span><span class="p">,</span> <span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">ScanKeyData</span><span class="w"> </span><span class="o">*</span><span class="n">key</span><span class="p">,</span> <span class="w"> </span><span class="n">ParallelTableScanDesc</span><span class="w"> </span><span class="n">parallel_scan</span><span class="p">,</span> <span class="w"> </span><span class="n">uint32</span><span class="w"> </span><span class="n">flags</span> <span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">memam_rescan</span><span class="p">(</span> <span class="w"> </span><span class="n">TableScanDesc</span><span class="w"> </span><span class="n">sscan</span><span class="p">,</span> <span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">ScanKeyData</span><span class="w"> </span><span class="o">*</span><span class="n">key</span><span class="p">,</span> <span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">set_params</span><span class="p">,</span> <span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">allow_strat</span><span class="p">,</span> <span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">allow_sync</span><span class="p">,</span> <span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">allow_pagemode</span> <span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">memam_endscan</span><span class="p">(</span><span class="n">TableScanDesc</span><span class="w"> </span><span class="n">sscan</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="nf">memam_getnextslot</span><span class="p">(</span> <span class="w"> </span><span class="n">TableScanDesc</span><span class="w"> </span><span class="n">sscan</span><span class="p">,</span> <span class="w"> </span><span class="n">ScanDirection</span><span class="w"> </span><span class="n">direction</span><span class="p">,</span> <span class="w"> </span><span class="n">TupleTableSlot</span><span class="w"> </span><span class="o">*</span><span class="n">slot</span> <span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">false</span><span class="p">;</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="n">IndexFetchTableData</span><span class="o">*</span><span class="w"> </span><span class="nf">memam_index_fetch_begin</span><span class="p">(</span><span class="n">Relation</span><span class="w"> </span><span class="n">rel</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">memam_index_fetch_reset</span><span class="p">(</span><span class="n">IndexFetchTableData</span><span class="w"> </span><span class="o">*</span><span class="n">scan</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">memam_index_fetch_end</span><span class="p">(</span><span class="n">IndexFetchTableData</span><span class="w"> </span><span class="o">*</span><span class="n">scan</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="nf">memam_index_fetch_tuple</span><span class="p">(</span> <span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">IndexFetchTableData</span><span class="w"> </span><span class="o">*</span><span class="n">scan</span><span class="p">,</span> <span class="w"> </span><span class="n">ItemPointer</span><span class="w"> </span><span class="n">tid</span><span class="p">,</span> <span class="w"> </span><span class="n">Snapshot</span><span class="w"> </span><span class="n">snapshot</span><span class="p">,</span> <span class="w"> </span><span class="n">TupleTableSlot</span><span class="w"> </span><span class="o">*</span><span class="n">slot</span><span class="p">,</span> <span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="o">*</span><span class="n">call_again</span><span class="p">,</span> <span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="o">*</span><span class="n">all_dead</span> <span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">false</span><span class="p">;</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">memam_tuple_insert</span><span class="p">(</span> <span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">relation</span><span class="p">,</span> <span class="w"> </span><span class="n">TupleTableSlot</span><span class="w"> </span><span class="o">*</span><span class="n">slot</span><span class="p">,</span> <span class="w"> </span><span class="n">CommandId</span><span class="w"> </span><span class="n">cid</span><span class="p">,</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">options</span><span class="p">,</span> <span class="w"> </span><span class="n">BulkInsertState</span><span class="w"> </span><span class="n">bistate</span> <span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">memam_tuple_insert_speculative</span><span class="p">(</span> <span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">relation</span><span class="p">,</span> <span class="w"> </span><span class="n">TupleTableSlot</span><span class="w"> </span><span class="o">*</span><span class="n">slot</span><span class="p">,</span> <span class="w"> </span><span class="n">CommandId</span><span class="w"> </span><span class="n">cid</span><span class="p">,</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">options</span><span class="p">,</span> <span class="w"> </span><span class="n">BulkInsertState</span><span class="w"> </span><span class="n">bistate</span><span class="p">,</span> <span class="w"> </span><span class="n">uint32</span><span class="w"> </span><span class="n">specToken</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">memam_tuple_complete_speculative</span><span class="p">(</span> <span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">relation</span><span class="p">,</span> <span class="w"> </span><span class="n">TupleTableSlot</span><span class="w"> </span><span class="o">*</span><span class="n">slot</span><span class="p">,</span> <span class="w"> </span><span class="n">uint32</span><span class="w"> </span><span class="n">specToken</span><span class="p">,</span> <span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">succeeded</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">memam_multi_insert</span><span class="p">(</span> <span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">relation</span><span class="p">,</span> <span class="w"> </span><span class="n">TupleTableSlot</span><span class="w"> </span><span class="o">**</span><span class="n">slots</span><span class="p">,</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">ntuples</span><span class="p">,</span> <span class="w"> </span><span class="n">CommandId</span><span class="w"> </span><span class="n">cid</span><span class="p">,</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">options</span><span class="p">,</span> <span class="w"> </span><span class="n">BulkInsertState</span><span class="w"> </span><span class="n">bistate</span> <span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="n">TM_Result</span><span class="w"> </span><span class="nf">memam_tuple_delete</span><span class="p">(</span> <span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">relation</span><span class="p">,</span> <span class="w"> </span><span class="n">ItemPointer</span><span class="w"> </span><span class="n">tid</span><span class="p">,</span> <span class="w"> </span><span class="n">CommandId</span><span class="w"> </span><span class="n">cid</span><span class="p">,</span> <span class="w"> </span><span class="n">Snapshot</span><span class="w"> </span><span class="n">snapshot</span><span class="p">,</span> <span class="w"> </span><span class="n">Snapshot</span><span class="w"> </span><span class="n">crosscheck</span><span class="p">,</span> <span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">wait</span><span class="p">,</span> <span class="w"> </span><span class="n">TM_FailureData</span><span class="w"> </span><span class="o">*</span><span class="n">tmfd</span><span class="p">,</span> <span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">changingPart</span> <span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">TM_Result</span><span class="w"> </span><span class="n">result</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{};</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">result</span><span class="p">;</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="n">TM_Result</span><span class="w"> </span><span class="nf">memam_tuple_update</span><span class="p">(</span> <span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">relation</span><span class="p">,</span> <span class="w"> </span><span class="n">ItemPointer</span><span class="w"> </span><span class="n">otid</span><span class="p">,</span> <span class="w"> </span><span class="n">TupleTableSlot</span><span class="w"> </span><span class="o">*</span><span class="n">slot</span><span class="p">,</span> <span class="w"> </span><span class="n">CommandId</span><span class="w"> </span><span class="n">cid</span><span class="p">,</span> <span class="w"> </span><span class="n">Snapshot</span><span class="w"> </span><span class="n">snapshot</span><span class="p">,</span> <span class="w"> </span><span class="n">Snapshot</span><span class="w"> </span><span class="n">crosscheck</span><span class="p">,</span> <span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">wait</span><span class="p">,</span> <span class="w"> </span><span class="n">TM_FailureData</span><span class="w"> </span><span class="o">*</span><span class="n">tmfd</span><span class="p">,</span> <span class="w"> </span><span class="n">LockTupleMode</span><span class="w"> </span><span class="o">*</span><span class="n">lockmode</span><span class="p">,</span> <span class="w"> </span><span class="n">TU_UpdateIndexes</span><span class="w"> </span><span class="o">*</span><span class="n">update_indexes</span> <span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">TM_Result</span><span class="w"> </span><span class="n">result</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{};</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">result</span><span class="p">;</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="n">TM_Result</span><span class="w"> </span><span class="nf">memam_tuple_lock</span><span class="p">(</span> <span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">relation</span><span class="p">,</span> <span class="w"> </span><span class="n">ItemPointer</span><span class="w"> </span><span class="n">tid</span><span class="p">,</span> <span class="w"> </span><span class="n">Snapshot</span><span class="w"> </span><span class="n">snapshot</span><span class="p">,</span> <span class="w"> </span><span class="n">TupleTableSlot</span><span class="w"> </span><span class="o">*</span><span class="n">slot</span><span class="p">,</span> <span class="w"> </span><span class="n">CommandId</span><span class="w"> </span><span class="n">cid</span><span class="p">,</span> <span class="w"> </span><span class="n">LockTupleMode</span><span class="w"> </span><span class="n">mode</span><span class="p">,</span> <span class="w"> </span><span class="n">LockWaitPolicy</span><span class="w"> </span><span class="n">wait_policy</span><span class="p">,</span> <span class="w"> </span><span class="n">uint8</span><span class="w"> </span><span class="n">flags</span><span class="p">,</span> <span class="w"> </span><span class="n">TM_FailureData</span><span class="w"> </span><span class="o">*</span><span class="n">tmfd</span><span class="p">)</span> <span class="p">{</span> <span class="w"> </span><span class="n">TM_Result</span><span class="w"> </span><span class="n">result</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{};</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">result</span><span class="p">;</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="nf">memam_fetch_row_version</span><span class="p">(</span> <span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">relation</span><span class="p">,</span> <span class="w"> </span><span class="n">ItemPointer</span><span class="w"> </span><span class="n">tid</span><span class="p">,</span> <span class="w"> </span><span class="n">Snapshot</span><span class="w"> </span><span class="n">snapshot</span><span class="p">,</span> <span class="w"> </span><span class="n">TupleTableSlot</span><span class="w"> </span><span class="o">*</span><span class="n">slot</span> <span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">false</span><span class="p">;</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">memam_get_latest_tid</span><span class="p">(</span> <span class="w"> </span><span class="n">TableScanDesc</span><span class="w"> </span><span class="n">sscan</span><span class="p">,</span> <span class="w"> </span><span class="n">ItemPointer</span><span class="w"> </span><span class="n">tid</span> <span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="nf">memam_tuple_tid_valid</span><span class="p">(</span><span class="n">TableScanDesc</span><span class="w"> </span><span class="n">scan</span><span class="p">,</span><span class="w"> </span><span class="n">ItemPointer</span><span class="w"> </span><span class="n">tid</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">false</span><span class="p">;</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="nf">memam_tuple_satisfies_snapshot</span><span class="p">(</span> <span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">rel</span><span class="p">,</span> <span class="w"> </span><span class="n">TupleTableSlot</span><span class="w"> </span><span class="o">*</span><span class="n">slot</span><span class="p">,</span> <span class="w"> </span><span class="n">Snapshot</span><span class="w"> </span><span class="n">snapshot</span> <span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">false</span><span class="p">;</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="n">TransactionId</span><span class="w"> </span><span class="nf">memam_index_delete_tuples</span><span class="p">(</span> <span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">rel</span><span class="p">,</span> <span class="w"> </span><span class="n">TM_IndexDeleteOp</span><span class="w"> </span><span class="o">*</span><span class="n">delstate</span> <span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">TransactionId</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{};</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">id</span><span class="p">;</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">memam_relation_set_new_filelocator</span><span class="p">(</span> <span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">rel</span><span class="p">,</span> <span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">RelFileLocator</span><span class="w"> </span><span class="o">*</span><span class="n">newrlocator</span><span class="p">,</span> <span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="n">persistence</span><span class="p">,</span> <span class="w"> </span><span class="n">TransactionId</span><span class="w"> </span><span class="o">*</span><span class="n">freezeXid</span><span class="p">,</span> <span class="w"> </span><span class="n">MultiXactId</span><span class="w"> </span><span class="o">*</span><span class="n">minmulti</span> <span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">memam_relation_nontransactional_truncate</span><span class="p">(</span> <span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">rel</span> <span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">memam_relation_copy_data</span><span class="p">(</span> <span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">rel</span><span class="p">,</span> <span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">RelFileLocator</span><span class="w"> </span><span class="o">*</span><span class="n">newrlocator</span> <span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">memam_relation_copy_for_cluster</span><span class="p">(</span> <span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">OldHeap</span><span class="p">,</span> <span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">NewHeap</span><span class="p">,</span> <span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">OldIndex</span><span class="p">,</span> <span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">use_sort</span><span class="p">,</span> <span class="w"> </span><span class="n">TransactionId</span><span class="w"> </span><span class="n">OldestXmin</span><span class="p">,</span> <span class="w"> </span><span class="n">TransactionId</span><span class="w"> </span><span class="o">*</span><span class="n">xid_cutoff</span><span class="p">,</span> <span class="w"> </span><span class="n">MultiXactId</span><span class="w"> </span><span class="o">*</span><span class="n">multi_cutoff</span><span class="p">,</span> <span class="w"> </span><span class="kt">double</span><span class="w"> </span><span class="o">*</span><span class="n">num_tuples</span><span class="p">,</span> <span class="w"> </span><span class="kt">double</span><span class="w"> </span><span class="o">*</span><span class="n">tups_vacuumed</span><span class="p">,</span> <span class="w"> </span><span class="kt">double</span><span class="w"> </span><span class="o">*</span><span class="n">tups_recently_dead</span> <span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">memam_vacuum_rel</span><span class="p">(</span> <span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">rel</span><span class="p">,</span> <span class="w"> </span><span class="n">VacuumParams</span><span class="w"> </span><span class="o">*</span><span class="n">params</span><span class="p">,</span> <span class="w"> </span><span class="n">BufferAccessStrategy</span><span class="w"> </span><span class="n">bstrategy</span> <span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="nf">memam_scan_analyze_next_block</span><span class="p">(</span> <span class="w"> </span><span class="n">TableScanDesc</span><span class="w"> </span><span class="n">scan</span><span class="p">,</span> <span class="w"> </span><span class="n">BlockNumber</span><span class="w"> </span><span class="n">blockno</span><span class="p">,</span> <span class="w"> </span><span class="n">BufferAccessStrategy</span><span class="w"> </span><span class="n">bstrategy</span> <span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">false</span><span class="p">;</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="nf">memam_scan_analyze_next_tuple</span><span class="p">(</span> <span class="w"> </span><span class="n">TableScanDesc</span><span class="w"> </span><span class="n">scan</span><span class="p">,</span> <span class="w"> </span><span class="n">TransactionId</span><span class="w"> </span><span class="n">OldestXmin</span><span class="p">,</span> <span class="w"> </span><span class="kt">double</span><span class="w"> </span><span class="o">*</span><span class="n">liverows</span><span class="p">,</span> <span class="w"> </span><span class="kt">double</span><span class="w"> </span><span class="o">*</span><span class="n">deadrows</span><span class="p">,</span> <span class="w"> </span><span class="n">TupleTableSlot</span><span class="w"> </span><span class="o">*</span><span class="n">slot</span> <span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">false</span><span class="p">;</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="kt">double</span><span class="w"> </span><span class="nf">memam_index_build_range_scan</span><span class="p">(</span> <span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">heapRelation</span><span class="p">,</span> <span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">indexRelation</span><span class="p">,</span> <span class="w"> </span><span class="n">IndexInfo</span><span class="w"> </span><span class="o">*</span><span class="n">indexInfo</span><span class="p">,</span> <span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">allow_sync</span><span class="p">,</span> <span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">anyvisible</span><span class="p">,</span> <span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">progress</span><span class="p">,</span> <span class="w"> </span><span class="n">BlockNumber</span><span class="w"> </span><span class="n">start_blockno</span><span class="p">,</span> <span class="w"> </span><span class="n">BlockNumber</span><span class="w"> </span><span class="n">numblocks</span><span class="p">,</span> <span class="w"> </span><span class="n">IndexBuildCallback</span><span class="w"> </span><span class="n">callback</span><span class="p">,</span> <span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="o">*</span><span class="n">callback_state</span><span class="p">,</span> <span class="w"> </span><span class="n">TableScanDesc</span><span class="w"> </span><span class="n">scan</span> <span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">memam_index_validate_scan</span><span class="p">(</span> <span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">heapRelation</span><span class="p">,</span> <span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">indexRelation</span><span class="p">,</span> <span class="w"> </span><span class="n">IndexInfo</span><span class="w"> </span><span class="o">*</span><span class="n">indexInfo</span><span class="p">,</span> <span class="w"> </span><span class="n">Snapshot</span><span class="w"> </span><span class="n">snapshot</span><span class="p">,</span> <span class="w"> </span><span class="n">ValidateIndexState</span><span class="w"> </span><span class="o">*</span><span class="n">state</span> <span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="nf">memam_relation_needs_toast_table</span><span class="p">(</span><span class="n">Relation</span><span class="w"> </span><span class="n">rel</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">false</span><span class="p">;</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="n">Oid</span><span class="w"> </span><span class="nf">memam_relation_toast_am</span><span class="p">(</span><span class="n">Relation</span><span class="w"> </span><span class="n">rel</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">Oid</span><span class="w"> </span><span class="n">oid</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{};</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">oid</span><span class="p">;</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">memam_fetch_toast_slice</span><span class="p">(</span> <span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">toastrel</span><span class="p">,</span> <span class="w"> </span><span class="n">Oid</span><span class="w"> </span><span class="n">valueid</span><span class="p">,</span> <span class="w"> </span><span class="n">int32</span><span class="w"> </span><span class="n">attrsize</span><span class="p">,</span> <span class="w"> </span><span class="n">int32</span><span class="w"> </span><span class="n">sliceoffset</span><span class="p">,</span> <span class="w"> </span><span class="n">int32</span><span class="w"> </span><span class="n">slicelength</span><span class="p">,</span> <span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">varlena</span><span class="w"> </span><span class="o">*</span><span class="n">result</span> <span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">memam_estimate_rel_size</span><span class="p">(</span> <span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">rel</span><span class="p">,</span> <span class="w"> </span><span class="n">int32</span><span class="w"> </span><span class="o">*</span><span class="n">attr_widths</span><span class="p">,</span> <span class="w"> </span><span class="n">BlockNumber</span><span class="w"> </span><span class="o">*</span><span class="n">pages</span><span class="p">,</span> <span class="w"> </span><span class="kt">double</span><span class="w"> </span><span class="o">*</span><span class="n">tuples</span><span class="p">,</span> <span class="w"> </span><span class="kt">double</span><span class="w"> </span><span class="o">*</span><span class="n">allvisfrac</span> <span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="nf">memam_scan_sample_next_block</span><span class="p">(</span> <span class="w"> </span><span class="n">TableScanDesc</span><span class="w"> </span><span class="n">scan</span><span class="p">,</span><span class="w"> </span><span class="n">SampleScanState</span><span class="w"> </span><span class="o">*</span><span class="n">scanstate</span> <span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">false</span><span class="p">;</span> <span class="p">}</span> <span class="k">static</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="nf">memam_scan_sample_next_tuple</span><span class="p">(</span> <span class="w"> </span><span class="n">TableScanDesc</span><span class="w"> </span><span class="n">scan</span><span class="p">,</span> <span class="w"> </span><span class="n">SampleScanState</span><span class="w"> </span><span class="o">*</span><span class="n">scanstate</span><span class="p">,</span> <span class="w"> </span><span class="n">TupleTableSlot</span><span class="w"> </span><span class="o">*</span><span class="n">slot</span> <span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">false</span><span class="p">;</span> <span class="p">}</span> <span class="k">const</span><span class="w"> </span><span class="n">TableAmRoutine</span><span class="w"> </span><span class="n">memam_methods</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">T_TableAmRoutine</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">slot_callbacks</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_slot_callbacks</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">scan_begin</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_beginscan</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">scan_end</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_endscan</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">scan_rescan</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_rescan</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">scan_getnextslot</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_getnextslot</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">parallelscan_estimate</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">table_block_parallelscan_estimate</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">parallelscan_initialize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">table_block_parallelscan_initialize</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">parallelscan_reinitialize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">table_block_parallelscan_reinitialize</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">index_fetch_begin</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_index_fetch_begin</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">index_fetch_reset</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_index_fetch_reset</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">index_fetch_end</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_index_fetch_end</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">index_fetch_tuple</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_index_fetch_tuple</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">tuple_insert</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_tuple_insert</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">tuple_insert_speculative</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_tuple_insert_speculative</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">tuple_complete_speculative</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_tuple_complete_speculative</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">multi_insert</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_multi_insert</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">tuple_delete</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_tuple_delete</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">tuple_update</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_tuple_update</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">tuple_lock</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_tuple_lock</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">tuple_fetch_row_version</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_fetch_row_version</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">tuple_get_latest_tid</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_get_latest_tid</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">tuple_tid_valid</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_tuple_tid_valid</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">tuple_satisfies_snapshot</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_tuple_satisfies_snapshot</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">index_delete_tuples</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_index_delete_tuples</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">relation_set_new_filelocator</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_relation_set_new_filelocator</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">relation_nontransactional_truncate</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_relation_nontransactional_truncate</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">relation_copy_data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_relation_copy_data</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">relation_copy_for_cluster</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_relation_copy_for_cluster</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">relation_vacuum</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_vacuum_rel</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">scan_analyze_next_block</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_scan_analyze_next_block</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">scan_analyze_next_tuple</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_scan_analyze_next_tuple</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">index_build_range_scan</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_index_build_range_scan</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">index_validate_scan</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_index_validate_scan</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">relation_size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">table_block_relation_size</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">relation_needs_toast_table</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_relation_needs_toast_table</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">relation_toast_am</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_relation_toast_am</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">relation_fetch_toast_slice</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_fetch_toast_slice</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">relation_estimate_size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_estimate_rel_size</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">scan_sample_next_block</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_scan_sample_next_block</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">scan_sample_next_tuple</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">memam_scan_sample_next_tuple</span> <span class="p">};</span> <span class="n">PG_FUNCTION_INFO_V1</span><span class="p">(</span><span class="n">mem_tableam_handler</span><span class="p">);</span> <span class="n">Datum</span><span class="w"> </span><span class="nf">mem_tableam_handler</span><span class="p">(</span><span class="n">PG_FUNCTION_ARGS</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">PG_RETURN_POINTER</span><span class="p">(</span><span class="o">&amp;</span><span class="n">memam_methods</span><span class="p">);</span> <span class="p">}</span> </pre></div> <p>Let's build and test it!</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>make<span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span>sudo<span class="w"> </span>make<span class="w"> </span>install <span class="gp">$ </span>/usr/local/pgsql/bin/psql<span class="w"> </span>postgres<span class="w"> </span>-f<span class="w"> </span>test.sql <span class="go">psql:test.sql:1: NOTICE: drop cascades to table x</span> <span class="go">DROP EXTENSION</span> <span class="go">CREATE EXTENSION</span> <span class="go">CREATE TABLE</span> </pre></div> <p>Hey we're getting somewhere! It successfully created the table with our custom table access method.</p> <h3 id="querying-rows">Querying rows</h3><p>Next, let's try querying the table by adding a <code>SELECT a FROM x</code> to <code>test.sql</code> and running it:</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>/usr/local/pgsql/bin/psql<span class="w"> </span>postgres<span class="w"> </span>-f<span class="w"> </span>test.sql <span class="go">psql:test.sql:1: NOTICE: drop cascades to table x</span> <span class="go">DROP EXTENSION</span> <span class="go">CREATE EXTENSION</span> <span class="go">CREATE TABLE</span> <span class="go">psql:test.sql:6: server closed the connection unexpectedly</span> <span class="go"> This probably means the server terminated abnormally</span> <span class="go"> before or while processing the request.</span> <span class="go">psql:test.sql:6: error: connection to server was lost</span> </pre></div> <p>This time there's nothing in <code>logfile</code> that helps:</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>tail<span class="w"> </span>-n15<span class="w"> </span>logfile <span class="go">2023-11-01 18:43:32.449 UTC [2906199] LOG: database system is ready to accept connections</span> <span class="go">2023-11-01 18:58:32.572 UTC [2907997] LOG: checkpoint starting: time</span> <span class="go">2023-11-01 18:58:35.305 UTC [2907997] LOG: checkpoint complete: wrote 28 buffers (0.2%); 0 WAL file(s) added, 0 removed, 0 recycled; write=2.712 s, sync=0.015 s, total=2.733 s; sync files=23, longest=0.004 s, average=0.001 s; distance=128 kB, estimate=150 kB; lsn=0/15F88E0, redo lsn=0/15F8888</span> <span class="go">2023-11-01 19:08:14.485 UTC [2906199] LOG: server process (PID 2908242) was terminated by signal 11: Segmentation fault</span> <span class="go">2023-11-01 19:08:14.485 UTC [2906199] DETAIL: Failed process was running: SELECT a FROM x;</span> <span class="go">2023-11-01 19:08:14.485 UTC [2906199] LOG: terminating any other active server processes</span> <span class="go">2023-11-01 19:08:14.486 UTC [2906199] LOG: all server processes terminated; reinitializing</span> <span class="go">2023-11-01 19:08:14.508 UTC [2908253] LOG: database system was interrupted; last known up at 2023-11-01 18:58:35 UTC</span> <span class="go">2023-11-01 19:08:14.518 UTC [2908253] LOG: database system was not properly shut down; automatic recovery in progress</span> <span class="go">2023-11-01 19:08:14.519 UTC [2908253] LOG: redo starts at 0/15F8888</span> <span class="go">2023-11-01 19:08:14.520 UTC [2908253] LOG: invalid record length at 0/161DE70: expected at least 24, got 0</span> <span class="go">2023-11-01 19:08:14.520 UTC [2908253] LOG: redo done at 0/161DE38 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s</span> <span class="go">2023-11-01 19:08:14.521 UTC [2908254] LOG: checkpoint starting: end-of-recovery immediate wait</span> <span class="go">2023-11-01 19:08:14.532 UTC [2908254] LOG: checkpoint complete: wrote 35 buffers (0.2%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.010 s, total=0.012 s; sync files=27, longest=0.003 s, average=0.001 s; distance=149 kB, estimate=149 kB; lsn=0/161DE70, redo lsn=0/161DE70</span> <span class="go">2023-11-01 19:08:14.533 UTC [2906199] LOG: database system is ready to accept connections</span> </pre></div> <p>This was the first place I got stuck. How on earth do I figure out what methods to implement? I mean, it's clearly one or more of these methods from the struct. But there are so many methods.</p> <p>I tried setting a breakpoint in <code>gdb</code> on the process returned by <code>SELECT pg_backend_pid()</code> for a <code>psql</code> session, but the breakpoint never seemed to be hit for any of my methods.</p> <p>So I did the low-tech solution and opened a file, <code>/tmp/pgtam.log</code>, turned off buffering on it, and added a log to every method on the <code>TableAmRoutine</code> struct:</p> <div class="highlight"><pre><span></span><span class="gu">@@ -12,9 +12,13 @@</span> <span class="w"> </span>const TableAmRoutine memam_methods; <span class="gi">+FILE* fd;</span> <span class="gi">+#define DEBUG_FUNC() fprintf(fd, &quot;in %s\n&quot;, __func__);</span> <span class="gi">+</span> <span class="w"> </span>static const TupleTableSlotOps* memam_slot_callbacks( <span class="w"> </span> Relation relation <span class="w"> </span>) { <span class="gi">+ DEBUG_FUNC();</span> <span class="w"> </span> return NULL; <span class="w"> </span>} <span class="gu">@@ -26,6 +30,7 @@</span> <span class="w"> </span> ParallelTableScanDesc parallel_scan, <span class="w"> </span> uint32 flags <span class="w"> </span>) { <span class="gi">+ DEBUG_FUNC();</span> <span class="w"> </span> return NULL; <span class="w"> </span>} <span class="gu">@@ -37,9 +42,11 @@</span> <span class="w"> </span> bool allow_sync, <span class="w"> </span> bool allow_pagemode <span class="w"> </span>) { <span class="gi">+ DEBUG_FUNC();</span> <span class="w"> </span>} <span class="w"> </span>static void memam_endscan(TableScanDesc sscan) { <span class="gi">+ DEBUG_FUNC();</span> <span class="w"> </span>} <span class="w"> </span>static bool memam_getnextslot( <span class="gu">@@ -47,17 +54,21 @@</span> <span class="w"> </span> ScanDirection direction, <span class="w"> </span> TupleTableSlot *slot <span class="w"> </span>) { <span class="gi">+ DEBUG_FUNC();</span> <span class="w"> </span> return false; <span class="w"> </span>} <span class="w"> </span>static IndexFetchTableData* memam_index_fetch_begin(Relation rel) { <span class="gi">+ DEBUG_FUNC();</span> <span class="w"> </span> return NULL; <span class="w"> </span>} <span class="w"> </span>static void memam_index_fetch_reset(IndexFetchTableData *scan) { <span class="gi">+ DEBUG_FUNC();</span> <span class="w"> </span>} <span class="w"> </span>static void memam_index_fetch_end(IndexFetchTableData *scan) { <span class="gi">+ DEBUG_FUNC();</span> <span class="w"> </span>} <span class="w"> </span>static bool memam_index_fetch_tuple( <span class="gu">@@ -68,6 +79,7 @@</span> <span class="w"> </span> bool *call_again, <span class="w"> </span> bool *all_dead <span class="w"> </span>) { <span class="gi">+ DEBUG_FUNC();</span> <span class="w"> </span> return false; <span class="w"> </span>} <span class="gu">@@ -78,6 +90,7 @@</span> <span class="w"> </span> int options, <span class="w"> </span> BulkInsertState bistate <span class="w"> </span>) { <span class="gi">+ DEBUG_FUNC();</span> <span class="w"> </span>} <span class="w"> </span>static void memam_tuple_insert_speculative( <span class="gu">@@ -87,6 +100,7 @@</span> <span class="w"> </span> int options, <span class="w"> </span> BulkInsertState bistate, <span class="w"> </span> uint32 specToken) { <span class="gi">+ DEBUG_FUNC();</span> <span class="w"> </span>} <span class="w"> </span>static void memam_tuple_complete_speculative( <span class="gu">@@ -94,6 +108,7 @@</span> <span class="w"> </span> TupleTableSlot *slot, <span class="w"> </span> uint32 specToken, <span class="w"> </span> bool succeeded) { <span class="gi">+ DEBUG_FUNC();</span> <span class="w"> </span>} <span class="w"> </span>static void memam_multi_insert( <span class="gu">@@ -104,6 +119,7 @@</span> <span class="w"> </span> int options, <span class="w"> </span> BulkInsertState bistate <span class="w"> </span>) { <span class="gi">+ DEBUG_FUNC();</span> <span class="w"> </span>} <span class="w"> </span>static TM_Result memam_tuple_delete( <span class="gu">@@ -117,6 +133,7 @@</span> <span class="w"> </span> bool changingPart <span class="w"> </span>) { <span class="w"> </span> TM_Result result = {}; <span class="gi">+ DEBUG_FUNC();</span> <span class="w"> </span> return result; <span class="w"> </span>} <span class="gu">@@ -133,6 +150,7 @@</span> <span class="w"> </span> TU_UpdateIndexes *update_indexes <span class="w"> </span>) { <span class="w"> </span> TM_Result result = {}; <span class="gi">+ DEBUG_FUNC();</span> <span class="w"> </span> return result; <span class="w"> </span>} <span class="gu">@@ -148,6 +166,7 @@</span> <span class="w"> </span> TM_FailureData *tmfd) <span class="w"> </span>{ <span class="w"> </span> TM_Result result = {}; <span class="gi">+ DEBUG_FUNC();</span> <span class="w"> </span> return result; <span class="w"> </span>} <span class="gu">@@ -157,6 +176,7 @@</span> <span class="w"> </span> Snapshot snapshot, <span class="w"> </span> TupleTableSlot *slot <span class="w"> </span>) { <span class="gi">+ DEBUG_FUNC();</span> <span class="w"> </span> return false; <span class="w"> </span>} <span class="gu">@@ -164,9 +184,11 @@</span> <span class="w"> </span> TableScanDesc sscan, <span class="w"> </span> ItemPointer tid <span class="w"> </span>) { <span class="gi">+ DEBUG_FUNC();</span> <span class="w"> </span>} <span class="w"> </span>static bool memam_tuple_tid_valid(TableScanDesc scan, ItemPointer tid) { <span class="gi">+ DEBUG_FUNC();</span> <span class="w"> </span> return false; <span class="w"> </span>} <span class="gu">@@ -175,6 +197,7 @@</span> <span class="w"> </span> TupleTableSlot *slot, <span class="w"> </span> Snapshot snapshot <span class="w"> </span>) { <span class="gi">+ DEBUG_FUNC();</span> <span class="w"> </span> return false; <span class="w"> </span>} <span class="gu">@@ -183,6 +206,7 @@</span> <span class="w"> </span> TM_IndexDeleteOp *delstate <span class="w"> </span>) { <span class="w"> </span> TransactionId id = {}; <span class="gi">+ DEBUG_FUNC();</span> <span class="w"> </span> return id; <span class="w"> </span>} <span class="gu">@@ -193,17 +217,20 @@</span> <span class="w"> </span> TransactionId *freezeXid, <span class="w"> </span> MultiXactId *minmulti <span class="w"> </span>) { <span class="gi">+ DEBUG_FUNC();</span> <span class="w"> </span>} <span class="w"> </span>static void memam_relation_nontransactional_truncate( <span class="w"> </span> Relation rel <span class="w"> </span>) { <span class="gi">+ DEBUG_FUNC();</span> <span class="w"> </span>} <span class="w"> </span>static void memam_relation_copy_data( <span class="w"> </span> Relation rel, <span class="w"> </span> const RelFileLocator *newrlocator <span class="w"> </span>) { <span class="gi">+ DEBUG_FUNC();</span> <span class="w"> </span>} <span class="w"> </span>static void memam_relation_copy_for_cluster( <span class="gu">@@ -218,6 +245,7 @@</span> <span class="w"> </span> double *tups_vacuumed, <span class="w"> </span> double *tups_recently_dead <span class="w"> </span>) { <span class="gi">+ DEBUG_FUNC();</span> <span class="w"> </span>} <span class="w"> </span>static void memam_vacuum_rel( <span class="gu">@@ -225,6 +253,7 @@</span> <span class="w"> </span> VacuumParams *params, <span class="w"> </span> BufferAccessStrategy bstrategy <span class="w"> </span>) { <span class="gi">+ DEBUG_FUNC();</span> <span class="w"> </span>} <span class="w"> </span>static bool memam_scan_analyze_next_block( <span class="gu">@@ -232,6 +261,7 @@</span> <span class="w"> </span> BlockNumber blockno, <span class="w"> </span> BufferAccessStrategy bstrategy <span class="w"> </span>) { <span class="gi">+ DEBUG_FUNC();</span> <span class="w"> </span> return false; <span class="w"> </span>} <span class="gu">@@ -242,6 +272,7 @@</span> <span class="w"> </span> double *deadrows, <span class="w"> </span> TupleTableSlot *slot <span class="w"> </span>) { <span class="gi">+ DEBUG_FUNC();</span> <span class="w"> </span> return false; <span class="w"> </span>} <span class="gu">@@ -258,6 +289,7 @@</span> <span class="w"> </span> void *callback_state, <span class="w"> </span> TableScanDesc scan <span class="w"> </span>) { <span class="gi">+ DEBUG_FUNC();</span> <span class="w"> </span> return 0; <span class="w"> </span>} <span class="gu">@@ -268,14 +300,17 @@</span> <span class="w"> </span> Snapshot snapshot, <span class="w"> </span> ValidateIndexState *state <span class="w"> </span>) { <span class="gi">+ DEBUG_FUNC();</span> <span class="w"> </span>} <span class="w"> </span>static bool memam_relation_needs_toast_table(Relation rel) { <span class="gi">+ DEBUG_FUNC();</span> <span class="w"> </span> return false; <span class="w"> </span>} <span class="w"> </span>static Oid memam_relation_toast_am(Relation rel) { <span class="w"> </span> Oid oid = {}; <span class="gi">+ DEBUG_FUNC();</span> <span class="w"> </span> return oid; <span class="w"> </span>} <span class="gu">@@ -287,6 +322,7 @@</span> <span class="w"> </span> int32 slicelength, <span class="w"> </span> struct varlena *result <span class="w"> </span>) { <span class="gi">+ DEBUG_FUNC();</span> <span class="w"> </span>} <span class="w"> </span>static void memam_estimate_rel_size( <span class="gu">@@ -296,11 +332,13 @@</span> <span class="w"> </span> double *tuples, <span class="w"> </span> double *allvisfrac <span class="w"> </span>) { <span class="gi">+ DEBUG_FUNC();</span> <span class="w"> </span>} <span class="w"> </span>static bool memam_scan_sample_next_block( <span class="w"> </span> TableScanDesc scan, SampleScanState *scanstate <span class="w"> </span>) { <span class="gi">+ DEBUG_FUNC();</span> <span class="w"> </span> return false; <span class="w"> </span>} <span class="gu">@@ -309,6 +347,7 @@</span> <span class="w"> </span> SampleScanState *scanstate, <span class="w"> </span> TupleTableSlot *slot <span class="w"> </span>) { <span class="gi">+ DEBUG_FUNC();</span> <span class="w"> </span> return false; <span class="w"> </span>} </pre></div> <p>And then in the entrypoint, initialize the file for logging.</p> <div class="highlight"><pre><span></span><span class="gu">@@ -369,5 +408,9 @@</span> <span class="w"> </span>PG_FUNCTION_INFO_V1(mem_tableam_handler); <span class="w"> </span>Datum mem_tableam_handler(PG_FUNCTION_ARGS) { <span class="gi">+ fd = fopen(&quot;/tmp/pgtam.log&quot;, &quot;a&quot;);</span> <span class="gi">+ setvbuf(fd, NULL, _IONBF, 0); // Prevent buffering</span> <span class="gi">+ fprintf(fd, &quot;\n\nmem_tableam handler loaded\n&quot;);</span> <span class="gi">+</span> <span class="w"> </span> PG_RETURN_POINTER(&amp;memam_methods); <span class="w"> </span>} </pre></div> <p>Let's give it a shot!</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>make<span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span>sudo<span class="w"> </span>make<span class="w"> </span>install <span class="gp">$ </span>/usr/local/pgsql/bin/psql<span class="w"> </span>postgres<span class="w"> </span>-f<span class="w"> </span>test.sql <span class="go">psql:test.sql:1: NOTICE: drop cascades to table x</span> <span class="go">DROP EXTENSION</span> <span class="go">CREATE EXTENSION</span> <span class="go">CREATE TABLE</span> <span class="go">psql:test.sql:6: server closed the connection unexpectedly</span> <span class="go"> This probably means the server terminated abnormally</span> <span class="go"> before or while processing the request.</span> <span class="go">psql:test.sql:6: error: connection to server was lost</span> </pre></div> <p>And let's check our log file:</p> <div class="highlight"><pre><span></span><span class="o">$</span><span class="w"> </span><span class="n">cat</span><span class="w"> </span><span class="o">/</span><span class="n">tmp</span><span class="o">/</span><span class="n">pgtam</span><span class="o">.</span><span class="n">log</span> <span class="n">mem_tableam</span><span class="w"> </span><span class="n">handler</span><span class="w"> </span><span class="n">loaded</span> <span class="n">mem_tableam</span><span class="w"> </span><span class="n">handler</span><span class="w"> </span><span class="n">loaded</span> <span class="ow">in</span><span class="w"> </span><span class="n">memam_relation_set_new_filelocator</span> <span class="n">mem_tableam</span><span class="w"> </span><span class="n">handler</span><span class="w"> </span><span class="n">loaded</span> <span class="ow">in</span><span class="w"> </span><span class="n">memam_relation_needs_toast_table</span> <span class="n">mem_tableam</span><span class="w"> </span><span class="n">handler</span><span class="w"> </span><span class="n">loaded</span> <span class="ow">in</span><span class="w"> </span><span class="n">memam_estimate_rel_size</span> <span class="ow">in</span><span class="w"> </span><span class="n">memam_slot_callbacks</span> </pre></div> <p>Now we're getting somewhere!</p> <p class="note"> I later realized <code>elog()</code> is the way most people log within Postgres/within extensions. I didn't know that when I was getting started though. This separate logging was a simple way to get the info out. </p><h4 id="<code>slot_callbacks</code>"><code>slot_callbacks</code></h4><p>Since the request crashes and the last logged function is <code>memam_slot_callbacks</code>, it seems like that is where we should concentrate. The <a href="https://www.postgresql.org/docs/current/tableam.html">table access method docs</a> suggest looking at the default <code>heap</code> access method for inspiration.</p> <p>Its <a href="https://github.com/postgres/postgres/blob/849172ff4883d44168f96f39d3fde96d0aa34c99/src/backend/access/heap/heapam_handler.c#L67">version</a> of <code>slot_callbacks</code> returns <code>&amp;TTSOpsBufferHeapTuple</code>:</p> <div class="highlight"><pre><span></span><span class="k">static</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">TupleTableSlotOps</span><span class="w"> </span><span class="o">*</span> <span class="nf">heapam_slot_callbacks</span><span class="p">(</span><span class="n">Relation</span><span class="w"> </span><span class="n">relation</span><span class="p">)</span> <span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&amp;</span><span class="n">TTSOpsBufferHeapTuple</span><span class="p">;</span> <span class="p">}</span> </pre></div> <p>I have no idea what that means, but since it is defined in <a href="https://github.com/postgres/postgres/blob/849172ff4883d44168f96f39d3fde96d0aa34c99/src/backend/executor/execTuples.c#L1080"><code>src/backend/executor/execTuples.c</code></a> it doesn't seem to be tied to the <code>heap</code> access method implementation. Let's try it.</p> <p class="note"> While it works initially, I noticed later on that <code>TTSOpsBufferHeapTuple</code> turns out not to be the right choice here. <code>TTSOpsVirtual</code> seems to be the right implementation. </p><div class="highlight"><pre><span></span><span class="gu">@@ -19,7 +19,7 @@</span> <span class="w"> </span> Relation relation <span class="w"> </span>) { <span class="w"> </span> DEBUG_FUNC(); <span class="gd">- return NULL;</span> <span class="gi">+ return &amp;TTSOpsVirtual;</span> <span class="w"> </span>} <span class="w"> </span>static TableScanDesc memam_beginscan( </pre></div> <p>Build and run:</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>make<span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span>sudo<span class="w"> </span>make<span class="w"> </span>install <span class="gp">$ </span>/usr/local/pgsql/bin/psql<span class="w"> </span>postgres<span class="w"> </span>-f<span class="w"> </span>test.sql <span class="go">psql:test.sql:1: NOTICE: drop cascades to table x</span> <span class="go">DROP EXTENSION</span> <span class="go">CREATE EXTENSION</span> <span class="go">CREATE TABLE</span> <span class="go">psql:test.sql:6: server closed the connection unexpectedly</span> <span class="go"> This probably means the server terminated abnormally</span> <span class="go"> before or while processing the request.</span> <span class="go">psql:test.sql:6: error: connection to server was lost</span> </pre></div> <p>It still crashes. But this time in <code>/tmp/pgtam.log</code> we made it into a new method!</p> <div class="highlight"><pre><span></span><span class="n">$</span><span class="w"> </span><span class="n">cat</span><span class="w"> </span><span class="o">/</span><span class="n">tmp</span><span class="o">/</span><span class="n">pgtam</span><span class="p">.</span><span class="n">log</span> <span class="n">mem_tableam</span><span class="w"> </span><span class="n">handler</span><span class="w"> </span><span class="n">loaded</span> <span class="n">mem_tableam</span><span class="w"> </span><span class="n">handler</span><span class="w"> </span><span class="n">loaded</span> <span class="n">in</span><span class="w"> </span><span class="n">memam_relation_set_new_filelocator</span> <span class="n">mem_tableam</span><span class="w"> </span><span class="n">handler</span><span class="w"> </span><span class="n">loaded</span> <span class="n">in</span><span class="w"> </span><span class="n">memam_relation_needs_toast_table</span> <span class="n">mem_tableam</span><span class="w"> </span><span class="n">handler</span><span class="w"> </span><span class="n">loaded</span> <span class="n">in</span><span class="w"> </span><span class="n">memam_estimate_rel_size</span> <span class="n">in</span><span class="w"> </span><span class="n">memam_slot_callbacks</span> <span class="n">in</span><span class="w"> </span><span class="n">memam_beginscan</span> </pre></div> <h4 id="<code>scan_begin</code>"><code>scan_begin</code></h4><p>The function signature is:</p> <div class="highlight"><pre><span></span><span class="n">TableScanDesc</span><span class="w"> </span><span class="nf">heap_beginscan</span><span class="p">(</span> <span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">relation</span><span class="p">,</span> <span class="w"> </span><span class="n">Snapshot</span><span class="w"> </span><span class="n">snapshot</span><span class="p">,</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">nkeys</span><span class="p">,</span> <span class="w"> </span><span class="n">ScanKey</span><span class="w"> </span><span class="n">key</span><span class="p">,</span> <span class="w"> </span><span class="n">ParallelTableScanDesc</span><span class="w"> </span><span class="n">parallel_scan</span><span class="p">,</span> <span class="w"> </span><span class="n">uint32</span><span class="w"> </span><span class="n">flags</span> <span class="p">);</span> </pre></div> <p>Since we just implemented stub versions of all the methods, we've been returning <code>NULL</code>. Since we're failing in this function, maybe we should try returning something that isn't <code>NULL</code>.</p> <p>By looking at the definition of <code>TableScanDesc</code>, we can see it is a pointer to the <code>TableScanDescData</code> struct defined in <a href="https://github.com/postgres/postgres/blob/849172ff4883d44168f96f39d3fde96d0aa34c99/src/include/access/relscan.h#L52"><code>src/include/access/relscan.h</code></a>.</p> <p>Let's <code>malloc</code> a <code>TableScanDescData</code>, free it in <code>endscan</code>, and return the <code>TableScanDescData</code> instance in <code>beginscan</code>:</p> <div class="highlight"><pre><span></span><span class="gu">@@ -30,8 +30,12 @@</span> <span class="w"> </span> ParallelTableScanDesc parallel_scan, <span class="w"> </span> uint32 flags <span class="w"> </span>) { <span class="gi">+ TableScanDescData* scan = {};</span> <span class="w"> </span> DEBUG_FUNC(); <span class="gd">- return NULL;</span> <span class="gi">+</span> <span class="gi">+ scan = (TableScanDescData*)malloc(sizeof(TableScanDescData));</span> <span class="gi">+</span> <span class="gi">+ return (TableScanDesc)scan;</span> <span class="w"> </span>} <span class="w"> </span>static void memam_rescan( <span class="gu">@@ -87,6 +87,7 @@</span> <span class="w"> </span>static void memam_endscan(TableScanDesc sscan) { <span class="w"> </span> DEBUG_FUNC(); <span class="gi">+ free(sscan);</span> <span class="w"> </span>} </pre></div> <p>Build and run (you can do it on your own). No difference.</p> <p>I got stuck for a while here too. Clearly something must be filled out in this struct but it could be anything. Through trial and error I realized the one field that must be filled out is <code>scan-&gt;rs_rd</code>.</p> <div class="highlight"><pre><span></span><span class="gu">@@ -34,6 +34,7 @@</span> <span class="w"> </span> DEBUG_FUNC(); <span class="w"> </span> scan = (TableScanDescData*)malloc(sizeof(TableScanDescData)); <span class="gi">+ scan-&gt;rs_rd = relation;</span> <span class="w"> </span> return (TableScanDesc)scan; <span class="w"> </span>} </pre></div> <p>We build and run:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>make<span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span>sudo<span class="w"> </span>make<span class="w"> </span>install $<span class="w"> </span>/usr/local/pgsql/bin/psql<span class="w"> </span>postgres<span class="w"> </span>-f<span class="w"> </span>test.sql psql:test.sql:1:<span class="w"> </span>NOTICE:<span class="w"> </span>drop<span class="w"> </span>cascades<span class="w"> </span>to<span class="w"> </span>table<span class="w"> </span>x DROP<span class="w"> </span>EXTENSION CREATE<span class="w"> </span>EXTENSION CREATE<span class="w"> </span>TABLE <span class="w"> </span>a --- <span class="o">(</span><span class="m">0</span><span class="w"> </span>rows<span class="o">)</span> </pre></div> <p>And it works! It doesn't return anything but that's correct. There's nothing to return.</p> <p>So what if we actually want to return something? Let's check our logs in <code>/tmp/pgtam.log</code>.</p> <div class="highlight"><pre><span></span><span class="o">$</span><span class="w"> </span><span class="n">cat</span><span class="w"> </span><span class="o">/</span><span class="n">tmp</span><span class="o">/</span><span class="n">pgtam</span><span class="o">.</span><span class="n">log</span> <span class="n">mem_tableam</span><span class="w"> </span><span class="n">handler</span><span class="w"> </span><span class="n">loaded</span> <span class="n">mem_tableam</span><span class="w"> </span><span class="n">handler</span><span class="w"> </span><span class="n">loaded</span> <span class="ow">in</span><span class="w"> </span><span class="n">memam_relation_set_new_filelocator</span> <span class="n">mem_tableam</span><span class="w"> </span><span class="n">handler</span><span class="w"> </span><span class="n">loaded</span> <span class="ow">in</span><span class="w"> </span><span class="n">memam_relation_needs_toast_table</span> <span class="n">mem_tableam</span><span class="w"> </span><span class="n">handler</span><span class="w"> </span><span class="n">loaded</span> <span class="ow">in</span><span class="w"> </span><span class="n">memam_estimate_rel_size</span> <span class="ow">in</span><span class="w"> </span><span class="n">memam_slot_callbacks</span> <span class="ow">in</span><span class="w"> </span><span class="n">memam_beginscan</span> <span class="ow">in</span><span class="w"> </span><span class="n">memam_getnextslot</span> <span class="ow">in</span><span class="w"> </span><span class="n">memam_endscan</span> </pre></div> <p>Ok, I'm getting the gist of the API. A full table scan (which this is, because there are no indexes at play) starts with an initialization for a slot, then the scan begins, then <code>getnextslot</code> is called for each row, and then <code>endscan</code> is called to allow for cleanup.</p> <p>So let's try returning a row in <code>getnextslot</code>.</p> <h4 id="<code>getnextslot</code>"><code>getnextslot</code></h4><p>The <code>getnextslot</code> signature is:</p> <div class="highlight"><pre><span></span><span class="kt">bool</span><span class="w"> </span><span class="nf">memam_getnextslot</span><span class="p">(</span> <span class="w"> </span><span class="n">TableScanDesc</span><span class="w"> </span><span class="n">sscan</span><span class="p">,</span> <span class="w"> </span><span class="n">ScanDirection</span><span class="w"> </span><span class="n">direction</span><span class="p">,</span> <span class="w"> </span><span class="n">TupleTableSlot</span><span class="w"> </span><span class="o">*</span><span class="n">slot</span> <span class="p">);</span> </pre></div> <p>So the <code>sscan</code> should be what we returned from <code>beginscan</code> and the <a href="https://github.com/postgres/postgres/blob/849172ff4883d44168f96f39d3fde96d0aa34c99/src/include/access/tableam.h#L341">interface docs</a> say the current row gets stored in <code>slot</code>.</p> <p class="note"> The return value seems to indicate whether or not we've reached the end of the scan. However, the scan will still end even if you <code>return true</code> if the <code>slot</code> is not filled out correctly. If the <code>slot</code> is filled out correctly and you unconditionally <code>return true</code>, you will crash the process. </p><p>Let's take a look at the <a href="https://github.com/postgres/postgres/blob/849172ff4883d44168f96f39d3fde96d0aa34c99/src/include/executor/tuptable.h#L114">definition</a> of <code>TupleTableSlot</code>:</p> <div class="highlight"><pre><span></span><span class="k">typedef</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="nc">TupleTableSlot</span> <span class="p">{</span> <span class="w"> </span><span class="n">NodeTag</span><span class="w"> </span><span class="n">type</span><span class="p">;</span> <span class="cp">#define FIELDNO_TUPLETABLESLOT_FLAGS 1</span> <span class="w"> </span><span class="n">uint16</span><span class="w"> </span><span class="n">tts_flags</span><span class="p">;</span><span class="w"> </span><span class="cm">/* Boolean states */</span> <span class="cp">#define FIELDNO_TUPLETABLESLOT_NVALID 2</span> <span class="w"> </span><span class="n">AttrNumber</span><span class="w"> </span><span class="n">tts_nvalid</span><span class="p">;</span><span class="w"> </span><span class="cm">/* # of valid values in tts_values */</span> <span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">TupleTableSlotOps</span><span class="w"> </span><span class="o">*</span><span class="k">const</span><span class="w"> </span><span class="n">tts_ops</span><span class="p">;</span><span class="w"> </span><span class="cm">/* implementation of slot */</span> <span class="cp">#define FIELDNO_TUPLETABLESLOT_TUPLEDESCRIPTOR 4</span> <span class="w"> </span><span class="n">TupleDesc</span><span class="w"> </span><span class="n">tts_tupleDescriptor</span><span class="p">;</span><span class="w"> </span><span class="cm">/* slot&#39;s tuple descriptor */</span> <span class="cp">#define FIELDNO_TUPLETABLESLOT_VALUES 5</span> <span class="w"> </span><span class="n">Datum</span><span class="w"> </span><span class="o">*</span><span class="n">tts_values</span><span class="p">;</span><span class="w"> </span><span class="cm">/* current per-attribute values */</span> <span class="cp">#define FIELDNO_TUPLETABLESLOT_ISNULL 6</span> <span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="o">*</span><span class="n">tts_isnull</span><span class="p">;</span><span class="w"> </span><span class="cm">/* current per-attribute isnull flags */</span> <span class="w"> </span><span class="n">MemoryContext</span><span class="w"> </span><span class="n">tts_mcxt</span><span class="p">;</span><span class="w"> </span><span class="cm">/* slot itself is in this context */</span> <span class="w"> </span><span class="n">ItemPointerData</span><span class="w"> </span><span class="n">tts_tid</span><span class="p">;</span><span class="w"> </span><span class="cm">/* stored tuple&#39;s tid */</span> <span class="w"> </span><span class="n">Oid</span><span class="w"> </span><span class="n">tts_tableOid</span><span class="p">;</span><span class="w"> </span><span class="cm">/* table oid of tuple */</span> <span class="p">}</span><span class="w"> </span><span class="n">TupleTableSlot</span><span class="p">;</span> </pre></div> <p><code>tts_values</code> is an array of <code>Datum</code> (which is a Postgres value). So that sounds like the actual values of the row. The <code>tts_isnull</code> field also looks important since that seems to be whether each value in the row is null or not. And <code>tts_nvalid</code> sounds important too since presumably it's the length of the <code>tts_isnull</code> and <code>tts_values</code> arrays.</p> <p>The rest of it may or may not be important. Let's try filling out these three fields though and see what happens.</p> <h4 id="datum">Datum</h4><p>Back in the <a href="https://www.postgresql.org/docs/current/xfunc-c.html">Postgres C extension documentation</a>, we can see some simple examples of converting between C types and Postgres's Datum type.</p> <p>For example:</p> <div class="highlight"><pre><span></span><span class="n">Datum</span> <span class="nf">add_one</span><span class="p">(</span><span class="n">PG_FUNCTION_ARGS</span><span class="p">)</span> <span class="p">{</span> <span class="w"> </span><span class="n">int32</span><span class="w"> </span><span class="n">arg</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">PG_GETARG_INT32</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span> <span class="w"> </span><span class="n">PG_RETURN_INT32</span><span class="p">(</span><span class="n">arg</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span> <span class="p">}</span> </pre></div> <p>If we look at the definition of <code>PG_RETURN_INT32</code> in <a href="https://github.com/postgres/postgres/blob/849172ff4883d44168f96f39d3fde96d0aa34c99/src/include/fmgr.h#L354"><code>src/include/fmgr.h</code></a>, we see:</p> <div class="highlight"><pre><span></span><span class="cp">#define PG_RETURN_INT32(x) return Int32GetDatum(x)</span> </pre></div> <p>So <code>Int32GetDatum()</code> is the function we'll use to set a <code>Datum</code> for a cell in a row.</p> <div class="highlight"><pre><span></span><span class="gu">@@ -54,13 +54,26 @@</span> <span class="w"> </span> DEBUG_FUNC(); <span class="w"> </span>} <span class="gi">+static bool done = false;</span> <span class="w"> </span>static bool memam_getnextslot( <span class="w"> </span> TableScanDesc sscan, <span class="w"> </span> ScanDirection direction, <span class="w"> </span> TupleTableSlot *slot <span class="w"> </span>) { <span class="w"> </span> DEBUG_FUNC(); <span class="gd">- return false;</span> <span class="gi">+</span> <span class="gi">+ if (done) {</span> <span class="gi">+ return false;</span> <span class="gi">+ }</span> <span class="gi">+</span> <span class="gi">+ slot-&gt;tts_nvalid = 1;</span> <span class="gi">+ slot-&gt;tts_values = (Datum*)malloc(sizeof(Datum) * slot-&gt;tts_nvalid);</span> <span class="gi">+ slot-&gt;tts_values[0] = Int32GetDatum(314 /* Some unique-looking value */);</span> <span class="gi">+ slot-&gt;tts_isnull = (bool*)malloc(sizeof(bool) * slot-&gt;tts_nvalid);</span> <span class="gi">+ slot-&gt;tts_isnull[0] = false;</span> <span class="gi">+ done = true;</span> <span class="gi">+</span> <span class="gi">+ return true;</span> <span class="w"> </span>} <span class="w"> </span>static IndexFetchTableData* memam_index_fetch_begin(Relation rel) { </pre></div> <p>The goal is that we return a single row and then exit the scan. It will have one 32-bit integer cell (remember we created the table <code>CREATE TABLE x (a INT)</code>; <code>INT</code> is shorthand for <code>INT4</code> which is a 32-bit integer) that will have the value <code>314</code>.</p> <p>But if we build and run this, we get no rows.</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>make<span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span>sudo<span class="w"> </span>make<span class="w"> </span>install <span class="gp">$ </span>/usr/local/pgsql/bin/psql<span class="w"> </span>postgres<span class="w"> </span>-f<span class="w"> </span>test.sql <span class="go">psql:test.sql:1: NOTICE: drop cascades to table x</span> <span class="go">DROP EXTENSION</span> <span class="go">CREATE EXTENSION</span> <span class="go">CREATE TABLE</span> <span class="go"> a</span> <span class="go">---</span> <span class="gp gp-VirtualEnv">(0 rows)</span> </pre></div> <p>I got stuck for a while here. Plugging my <code>getnextslot</code> code into ChatGPT helped. One thing it gave me to try was calling <code>ExecStoreVirtualTuple</code> on the <code>slot</code>. I noticed that the built-in <code>heap</code> access method <a href="https://github.com/postgres/postgres/blob/849172ff4883d44168f96f39d3fde96d0aa34c99/src/backend/access/heap/heapam.c#L1159">also called a function like this</a> in <code>getnextslot</code>.</p> <p>And I realized that <code>tts_nvalid</code> is already set up and the memory for <code>tts_values</code> and <code>tts_isnull</code> is already allocated. So the code became a little simpler.</p> <div class="highlight"><pre><span></span><span class="gu">@@ -66,11 +66,9 @@</span> <span class="w"> </span> return false; <span class="w"> </span> } <span class="gd">- slot-&gt;tts_nvalid = 1;</span> <span class="gd">- slot-&gt;tts_values = (Datum*)malloc(sizeof(Datum) * slot-&gt;tts_nvalid);</span> <span class="w"> </span> slot-&gt;tts_values[0] = Int32GetDatum(314 /* Some unique-looking value */); <span class="gd">- slot-&gt;tts_isnull = (bool*)malloc(sizeof(bool) * slot-&gt;tts_nvalid);</span> <span class="w"> </span> slot-&gt;tts_isnull[0] = false; <span class="gi">+ ExecStoreVirtualTuple(slot);</span> <span class="w"> </span> done = true; <span class="w"> </span> return true; </pre></div> <p>Build and run:</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>make<span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span>sudo<span class="w"> </span>make<span class="w"> </span>install <span class="gp">$ </span>/usr/local/pgsql/bin/psql<span class="w"> </span>postgres<span class="w"> </span>-f<span class="w"> </span>test.sql <span class="go">psql:test.sql:1: NOTICE: drop cascades to table x</span> <span class="go">DROP EXTENSION</span> <span class="go">CREATE EXTENSION</span> <span class="go">CREATE TABLE</span> <span class="go"> a</span> <span class="go">-----</span> <span class="go"> 314</span> <span class="gp gp-VirtualEnv">(1 row)</span> </pre></div> <p>Fantastic!</p> <h3 id="creating-a-table">Creating a table</h3><p>Now that we've proven we can return random data, let's set up infrastructure for storing tables in memory.</p> <div class="highlight"><pre><span></span><span class="gu">@@ -15,6 +15,41 @@</span> <span class="w"> </span>FILE* fd; <span class="w"> </span>#define DEBUG_FUNC() fprintf(fd, &quot;in %s\n&quot;, __func__); <span class="gi">+</span> <span class="gi">+struct Column {</span> <span class="gi">+ int value;</span> <span class="gi">+};</span> <span class="gi">+</span> <span class="gi">+struct Row {</span> <span class="gi">+ struct Column* columns;</span> <span class="gi">+ size_t ncolumns;</span> <span class="gi">+};</span> <span class="gi">+</span> <span class="gi">+#define MAX_ROWS 100</span> <span class="gi">+struct Table {</span> <span class="gi">+ char* name;</span> <span class="gi">+ struct Row* rows;</span> <span class="gi">+ size_t nrows;</span> <span class="gi">+};</span> <span class="gi">+</span> <span class="gi">+#define MAX_TABLES 100</span> <span class="gi">+struct Database {</span> <span class="gi">+ struct Table* tables;</span> <span class="gi">+ size_t ntables;</span> <span class="gi">+};</span> <span class="gi">+</span> <span class="gi">+struct Database* database;</span> <span class="gi">+</span> <span class="gi">+static void get_table(struct Table** table, Relation relation) {</span> <span class="gi">+ char* this_name = NameStr(relation-&gt;rd_rel-&gt;relname);</span> <span class="gi">+ for (size_t i = 0; i &lt; database-&gt;ntables; i++) {</span> <span class="gi">+ if (strcmp(database-&gt;tables[i].name, this_name) == 0) {</span> <span class="gi">+ *table = &amp;database-&gt;tables[i];</span> <span class="gi">+ return;</span> <span class="gi">+ }</span> <span class="gi">+ }</span> <span class="gi">+}</span> <span class="gi">+</span> <span class="w"> </span>static const TupleTableSlotOps* memam_slot_callbacks( <span class="w"> </span> Relation relation <span class="w"> </span>) { </pre></div> <p>Based on what we logged in <code>/tmp/pgtam.log</code> it seems like <code>memam_relation_set_new_filelocator</code> is called when a new table is created. So let's handle adding a new table there.</p> <div class="highlight"><pre><span></span><span class="gu">@@ -233,7 +268,16 @@</span> <span class="w"> </span> TransactionId *freezeXid, <span class="w"> </span> MultiXactId *minmulti <span class="w"> </span>) { <span class="gi">+ struct Table table = {};</span> <span class="w"> </span> DEBUG_FUNC(); <span class="gi">+</span> <span class="gi">+ table.name = strdup(NameStr(rel-&gt;rd_rel-&gt;relname));</span> <span class="gi">+ fprintf(fd, &quot;Created table: [%s]\n&quot;, table.name);</span> <span class="gi">+ table.rows = (struct Row*)malloc(sizeof(struct Row) * MAX_ROWS);</span> <span class="gi">+ table.nrows = 0;</span> <span class="gi">+</span> <span class="gi">+ database-&gt;tables[database-&gt;ntables] = table;</span> <span class="gi">+ database-&gt;ntables++;</span> <span class="w"> </span>} <span class="w"> </span>static void memam_relation_nontransactional_truncate( </pre></div> <p>Finally, we'll initialize the in-memory <code>Database*</code> when the handler is loaded.</p> <div class="highlight"><pre><span></span><span class="gu">@@ -428,5 +472,11 @@</span> <span class="w"> </span> setvbuf(fd, NULL, _IONBF, 0); // Prevent buffering <span class="w"> </span> fprintf(fd, &quot;\n\nmem_tableam handler loaded\n&quot;); <span class="gi">+ if (database == NULL) {</span> <span class="gi">+ database = (struct Database*)malloc(sizeof(struct Database));</span> <span class="gi">+ database-&gt;ntables = 0;</span> <span class="gi">+ database-&gt;tables = (struct Table*)malloc(sizeof(struct Table) * MAX_TABLES);</span> <span class="gi">+ }</span> <span class="gi">+</span> <span class="w"> </span> PG_RETURN_POINTER(&amp;memam_methods); <span class="w"> </span>} </pre></div> <p>If we build and run, we won't notice anything new.</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>make<span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span>sudo<span class="w"> </span>make<span class="w"> </span>install <span class="gp">$ </span>/usr/local/pgsql/bin/psql<span class="w"> </span>postgres<span class="w"> </span>-f<span class="w"> </span>test.sql <span class="go">psql:test.sql:1: NOTICE: drop cascades to table x</span> <span class="go">DROP EXTENSION</span> <span class="go">CREATE EXTENSION</span> <span class="go">CREATE TABLE</span> <span class="go"> a</span> <span class="go">-----</span> <span class="go"> 314</span> <span class="gp gp-VirtualEnv">(1 row)</span> </pre></div> <p>But we should see a message in <code>/tmp/pgtam.log</code> about the new table being created.</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>cat<span class="w"> </span>/tmp/pgtam.log <span class="go">mem_tableam handler loaded</span> <span class="go">mem_tableam handler loaded</span> <span class="go">in memam_relation_set_new_filelocator</span> <span class="go">Created table: [x]</span> <span class="go">mem_tableam handler loaded</span> <span class="go">in memam_relation_needs_toast_table</span> <span class="go">mem_tableam handler loaded</span> <span class="go">in memam_estimate_rel_size</span> <span class="go">in memam_slot_callbacks</span> <span class="go">in memam_beginscan</span> <span class="go">in memam_getnextslot</span> <span class="go">in memam_getnextslot</span> <span class="go">in memam_endscan</span> </pre></div> <p>And there it is! Creation looks good.</p> <h3 id="inserting-rows">Inserting rows</h3><p>Let's add <code>INSERT INTO x VALUES (23), (101);</code> to <code>test.sql</code> and run the SQL script.</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>/usr/local/pgsql/bin/psql<span class="w"> </span>postgres<span class="w"> </span>-f<span class="w"> </span>test.sql <span class="go">psql:test.sql:1: NOTICE: drop cascades to table x</span> <span class="go">DROP EXTENSION</span> <span class="go">CREATE EXTENSION</span> <span class="go">CREATE TABLE</span> <span class="go">INSERT 0 2</span> <span class="go"> a</span> <span class="go">-----</span> <span class="go"> 314</span> <span class="gp gp-VirtualEnv">(1 row)</span> </pre></div> <p>And let's check the log to see what method is called when we try to <code>INSERT</code>.</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>cat<span class="w"> </span>/tmp/pgtam.log <span class="go">mem_tableam handler loaded</span> <span class="go">mem_tableam handler loaded</span> <span class="go">in memam_relation_set_new_filelocator</span> <span class="go">Created table: [x]</span> <span class="go">mem_tableam handler loaded</span> <span class="go">in memam_relation_needs_toast_table</span> <span class="go">mem_tableam handler loaded</span> <span class="go">in memam_slot_callbacks</span> <span class="go">in memam_tuple_insert</span> <span class="go">in memam_tuple_insert</span> <span class="go">in memam_estimate_rel_size</span> <span class="go">in memam_slot_callbacks</span> <span class="go">in memam_beginscan</span> <span class="go">in memam_getnextslot</span> <span class="go">in memam_getnextslot</span> <span class="go">in memam_endscan</span> </pre></div> <p><code>tuple_insert</code> seems to be the method! Looks like it gets called once for each row to insert. Perfect.</p> <p>The signature for <code>tuple_insert</code> is:</p> <div class="highlight"><pre><span></span><span class="kt">void</span><span class="w"> </span><span class="nf">memam_tuple_insert</span><span class="p">(</span> <span class="w"> </span><span class="n">Relation</span><span class="w"> </span><span class="n">relation</span><span class="p">,</span> <span class="w"> </span><span class="n">TupleTableSlot</span><span class="w"> </span><span class="o">*</span><span class="n">slot</span><span class="p">,</span> <span class="w"> </span><span class="n">CommandId</span><span class="w"> </span><span class="n">cid</span><span class="p">,</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">options</span><span class="p">,</span> <span class="w"> </span><span class="n">BulkInsertState</span><span class="w"> </span><span class="n">bistate</span> <span class="p">);</span> </pre></div> <p>We can get the table name from <code>relation</code>, and instead of writing to <code>slot</code> we can read from <code>slot-&gt;tts_values</code> instead.</p> <div class="highlight"><pre><span></span><span class="gu">@@ -141,7 +141,38 @@</span> <span class="w"> </span> int options, <span class="w"> </span> BulkInsertState bistate <span class="w"> </span>) { <span class="gi">+ TupleDesc desc = RelationGetDescr(relation);</span> <span class="gi">+ struct Table* table = NULL;</span> <span class="gi">+ struct Column column = {};</span> <span class="gi">+ struct Row row = {};</span> <span class="gi">+</span> <span class="w"> </span> DEBUG_FUNC(); <span class="gi">+</span> <span class="gi">+ get_table(&amp;table, relation);</span> <span class="gi">+ if (table == NULL) {</span> <span class="gi">+ elog(ERROR, &quot;table not found&quot;);</span> <span class="gi">+ return;</span> <span class="gi">+ }</span> <span class="gi">+</span> <span class="gi">+ if (table-&gt;nrows == MAX_ROWS) {</span> <span class="gi">+ elog(ERROR, &quot;cannot insert more rows&quot;);</span> <span class="gi">+ return;</span> <span class="gi">+ }</span> <span class="gi">+</span> <span class="gi">+ row.ncolumns = desc-&gt;natts;</span> <span class="gi">+ Assert(slot-&gt;tts_nvalid == row.ncolumns);</span> <span class="gi">+ Assert(row.ncolumns &gt; 0);</span> <span class="gi">+</span> <span class="gi">+ row.columns = (struct Column*)malloc(sizeof(struct Column) * row.ncolumns);</span> <span class="gi">+ for (size_t i = 0; i &lt; row.ncolumns; i++) {</span> <span class="gi">+ Assert(desc-&gt;attrs[i].atttypid == INT4OID);</span> <span class="gi">+ column.value = DatumGetInt32(slot-&gt;tts_values[i]);</span> <span class="gi">+ row.columns[i] = column;</span> <span class="gi">+ fprintf(fd, &quot;Got value: %d\n&quot;, column.value);</span> <span class="gi">+ }</span> <span class="gi">+</span> <span class="gi">+ table-&gt;rows[table-&gt;nrows] = row;</span> <span class="gi">+ table-&gt;nrows++;</span> <span class="w"> </span>} <span class="w"> </span>static void memam_tuple_insert_speculative( </pre></div> <p>Build and run and again we won't notice anything new.</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>make<span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span>sudo<span class="w"> </span>make<span class="w"> </span>install <span class="gp">$ </span>/usr/local/pgsql/bin/psql<span class="w"> </span>postgres<span class="w"> </span>-f<span class="w"> </span>test.sql <span class="go">psql:test.sql:1: NOTICE: drop cascades to table x</span> <span class="go">DROP EXTENSION</span> <span class="go">CREATE EXTENSION</span> <span class="go">CREATE TABLE</span> <span class="go">INSERT 0 2</span> <span class="go"> a</span> <span class="go">-----</span> <span class="go"> 314</span> <span class="gp gp-VirtualEnv">(1 row)</span> </pre></div> <p>But if we check the logs, we should see the two column values we inserted, one for each row.</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>cat<span class="w"> </span>/tmp/pgtam.log <span class="go">mem_tableam handler loaded</span> <span class="go">mem_tableam handler loaded</span> <span class="go">in memam_relation_set_new_filelocator</span> <span class="go">Created table: [x]</span> <span class="go">mem_tableam handler loaded</span> <span class="go">in memam_relation_needs_toast_table</span> <span class="go">mem_tableam handler loaded</span> <span class="go">in memam_slot_callbacks</span> <span class="go">in memam_tuple_insert</span> <span class="go">Got value: 23</span> <span class="go">in memam_tuple_insert</span> <span class="go">Got value: 101</span> <span class="go">in memam_estimate_rel_size</span> <span class="go">in memam_slot_callbacks</span> <span class="go">in memam_beginscan</span> <span class="go">in memam_getnextslot</span> <span class="go">in memam_getnextslot</span> <span class="go">in memam_endscan</span> </pre></div> <p>Woohoo!</p> <h3 id="un-hardcoding-the-scan">Un-hardcoding the scan</h3><p>The final thing we need to do is drop the hardcoded <code>314</code> we returned from <code>getnextslot</code> and instead we need to look up the current table and return rows from it. This also means we need to keep track of which row we're on. So <code>beginscan</code> will also need to change slightly.</p> <div class="highlight"><pre><span></span><span class="gu">@@ -57,6 +56,14 @@</span> <span class="w"> </span> return &amp;TTSOpsVirtual; <span class="w"> </span>} <span class="gi">+</span> <span class="gi">+struct MemScanDesc {</span> <span class="gi">+ TableScanDescData rs_base; // Base class from access/relscan.h.</span> <span class="gi">+</span> <span class="gi">+ // Custom data.</span> <span class="gi">+ uint32 cursor;</span> <span class="gi">+};</span> <span class="gi">+</span> <span class="w"> </span>static TableScanDesc memam_beginscan( <span class="w"> </span> Relation relation, <span class="w"> </span> Snapshot snapshot, <span class="gu">@@ -65,11 +72,13 @@</span> <span class="w"> </span> ParallelTableScanDesc parallel_scan, <span class="w"> </span> uint32 flags <span class="w"> </span>) { <span class="gd">- TableScanDescData* scan = {};</span> <span class="gd">- DEBUG_FUNC();</span> <span class="gi">+ struct MemScanDesc* scan;</span> <span class="gd">- scan = (TableScanDescData*)malloc(sizeof(TableScanDescData));</span> <span class="gd">- scan-&gt;rs_rd = relation;</span> <span class="gi">+ DEBUG_FUNC();</span> <span class="gi">+</span> <span class="gi">+ scan = (struct MemScanDesc*)malloc(sizeof(struct MemScanDesc));</span> <span class="gi">+ scan-&gt;rs_base.rs_rd = relation;</span> <span class="gi">+ scan-&gt;cursor = 0;</span> <span class="w"> </span> return (TableScanDesc)scan; <span class="w"> </span>} <span class="gu">@@ -89,23 +97,26 @@</span> <span class="w"> </span> DEBUG_FUNC(); <span class="w"> </span>} <span class="gd">-static bool done = false;</span> <span class="w"> </span>static bool memam_getnextslot( <span class="w"> </span> TableScanDesc sscan, <span class="w"> </span> ScanDirection direction, <span class="w"> </span> TupleTableSlot *slot <span class="w"> </span>) { <span class="gi">+ struct MemScanDesc* mscan = (struct MemScanDesc*)sscan;</span> <span class="gi">+ struct Table* table = NULL;</span> <span class="w"> </span> DEBUG_FUNC(); <span class="gd">- if (done) {</span> <span class="gi">+ ExecClearTuple(slot);</span> <span class="gi">+</span> <span class="gi">+ get_table(&amp;table, mscan-&gt;rs_base.rs_rd);</span> <span class="gi">+ if (table == NULL || mscan-&gt;cursor == table-&gt;nrows) {</span> <span class="w"> </span> return false; <span class="w"> </span> } <span class="gd">- slot-&gt;tts_values[0] = Int32GetDatum(314 /* Some unique-looking value */);</span> <span class="gi">+ slot-&gt;tts_values[0] = Int32GetDatum(table-&gt;rows[mscan-&gt;cursor].columns[0].value);</span> <span class="w"> </span> slot-&gt;tts_isnull[0] = false; <span class="w"> </span> ExecStoreVirtualTuple(slot); <span class="gd">- done = true;</span> <span class="gd">-</span> <span class="gi">+ mscan-&gt;cursor++;</span> <span class="w"> </span> return true; <span class="w"> </span>} </pre></div> <p>Let's try it out.</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>make<span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span>sudo<span class="w"> </span>make<span class="w"> </span>install <span class="gp">$ </span>/usr/local/pgsql/bin/psql<span class="w"> </span>postgres<span class="w"> </span>-f<span class="w"> </span>test.sql <span class="go">psql:test.sql:1: NOTICE: drop cascades to table x</span> <span class="go">DROP EXTENSION</span> <span class="go">CREATE EXTENSION</span> <span class="go">CREATE TABLE</span> <span class="go">INSERT 0 2</span> <span class="go"> a</span> <span class="go">-----</span> <span class="go"> 23</span> <span class="go"> 101</span> <span class="gp gp-VirtualEnv">(2 rows)</span> </pre></div> <p>And there we have it. :)</p> <h3 id="awesome-sql-power">Awesome SQL power</h3><p>So we tried one table and we tried a <code>SELECT</code> without anything else.</p> <p>What happens if we use more of SQL? Let's create another table and try some more complex queries. Edit <code>test.sql</code>:</p> <div class="highlight"><pre><span></span><span class="k">DROP</span><span class="w"> </span><span class="n">EXTENSION</span><span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="k">EXISTS</span><span class="w"> </span><span class="n">pgtam</span><span class="w"> </span><span class="k">CASCADE</span><span class="p">;</span> <span class="k">CREATE</span><span class="w"> </span><span class="n">EXTENSION</span><span class="w"> </span><span class="n">pgtam</span><span class="p">;</span> <span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">x</span><span class="p">(</span><span class="n">a</span><span class="w"> </span><span class="nb">INT</span><span class="p">)</span><span class="w"> </span><span class="k">USING</span><span class="w"> </span><span class="n">mem</span><span class="p">;</span> <span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">y</span><span class="p">(</span><span class="n">b</span><span class="w"> </span><span class="nb">INT</span><span class="p">)</span><span class="w"> </span><span class="k">USING</span><span class="w"> </span><span class="n">mem</span><span class="p">;</span> <span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="mi">23</span><span class="p">),</span><span class="w"> </span><span class="p">(</span><span class="mi">101</span><span class="p">);</span> <span class="k">SELECT</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">x</span><span class="p">;</span> <span class="k">SELECT</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">100</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">23</span><span class="p">;</span> <span class="k">SELECT</span><span class="w"> </span><span class="n">a</span><span class="p">,</span><span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="k">GROUP</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="k">COUNT</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="k">DESC</span><span class="p">;</span> <span class="k">SELECT</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">y</span><span class="p">;</span> </pre></div> <p>Run it:</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>/usr/local/pgsql/bin/psql<span class="w"> </span>postgres<span class="w"> </span>-f<span class="w"> </span>test.sql <span class="go">psql:test.sql:1: NOTICE: drop cascades to 2 other objects</span> <span class="go">DETAIL: drop cascades to table x</span> <span class="go">drop cascades to table y</span> <span class="go">DROP EXTENSION</span> <span class="go">CREATE EXTENSION</span> <span class="go">CREATE TABLE</span> <span class="go">CREATE TABLE</span> <span class="go">INSERT 0 2</span> <span class="go"> a</span> <span class="go">-----</span> <span class="go"> 23</span> <span class="go"> 101</span> <span class="gp gp-VirtualEnv">(2 rows)</span> <span class="go"> ?column?</span> <span class="go">----------</span> <span class="go"> 123</span> <span class="gp gp-VirtualEnv">(1 row)</span> <span class="go"> a | count</span> <span class="go">-----+-------</span> <span class="go"> 23 | 1</span> <span class="go"> 101 | 1</span> <span class="gp gp-VirtualEnv">(2 rows)</span> <span class="go"> b</span> <span class="go">---</span> <span class="gp gp-VirtualEnv">(0 rows)</span> </pre></div> <p>Pretty sweet!</p> <h3 id="next-steps">Next steps</h3><p>It would be neat to build a storage engine that reads from and writes to a CSV a la MySQL's CSV storage engine. Or a storage engine that uses RocksDB.</p> <p>It would also be good to figure out how indexes work, how deletions work, how updates and DDL beyond <code>CREATE</code> works.</p> <p>And I should probably contribute some of this to the <a href="https://www.postgresql.org/docs/current/tableam.html">table access method</a> docs which are pretty sparse at the moment.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I&#39;ve been working this week to understand Postgres Table Access Methods for alternative storage engines.<br><br>Especially challenging because the documentation is pretty sparse and few minimal implementations exist.<br><br>I wrote up my approach!<a href="https://t.co/LQGglRkev5">https://t.co/LQGglRkev5</a> <a href="https://t.co/v0MeOu4Hbr">pic.twitter.com/v0MeOu4Hbr</a></p>&mdash; Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1719873793693221157?ref_src=twsrc%5Etfw">November 2, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/2023-11-01-postgres-table-access-methods.htmlWed, 01 Nov 2023 00:00:00 +0000io_uring basics: Writing a file to diskhttp://notes.eatonphil.com/2023-10-19-write-file-to-disk-with-io_uring.html<p>King and I <a href="https://tigerbeetle.com/blog/a-friendly-abstraction-over-iouring-and-kqueue/">wrote a blog post</a> about building an event-driven cross-platform IO library that used io_uring on Linux. We sketched out how it works at a high level but I hadn't yet internalized how you actually code with io_uring. So I strapped myself down this week and wrote <a href="https://github.com/eatonphil/io-playground">some benchmarks</a> to build my intuition about io_uring and other IO models.</p> <p>I started with implementations in Go and ported them to Zig to make sure I had done the Go versions decently. And I got some help from King and other internetters to find some inefficiencies in my code.</p> <p>This post will walk through my process, getting increasingly efficient (and a little increasingly complex) ways to write an entire file to disk with io_uring, from Go and Zig.</p> <p>Notably, we're not going to <code>fsync()</code> and we're not going to use <code>O_DIRECT</code>. So we won't be testing the entire IO pipeline from userland to disk hardware but just how fast IO gets to the kernel. The focus of this post is more on IO methods and using io_uring, not absolute numbers.</p> <p>All code for this post is <a href="https://github.com/eatonphil/io_uring-basics-writing-file">available on GitHub</a>.</p> <p class="note"> This code is going to indirectly show some differences in timing between Go and Zig. I could care less about benchmarketing. And I hope something about Zig vs Go is not what you take away from this post either. <br /> <br /> The goal is to build an intuition and be generally correct. Observing the same relative behavior between implementations across two languages helps me gain confidence what I'm doing is correct. </p><h3 id="io_uring">io_uring</h3><p>With normal blocking syscalls you just call <code>read()</code> or <code>write()</code> and wait for the results. io_uring is one of Linux's more powerful <em>asynchronous</em> IO offerings. Unlike epoll, you can use io_uring with both files and network connections. And unlike epoll you can even have the syscall run in the kernel.</p> <p>To interact with io_uring, you register a submission queue for syscalls and their arguments. And you register a completion queue for syscall results.</p> <p>You can batch many syscalls in one single call to io_uring, effectively turning up to N (4096 at most) syscalls into just one syscall. The kernel still does all the work of the N syscalls but you avoid some overhead.</p> <p>As you check the completion queue and handle completed submissions, the submission queue is also freed all or somewhat, and you can now add more submissions.</p> <p>For a more complete understanding, check out the kernel document <a href="https://kernel.dk/io_uring.pdf">Efficient IO with io_uring</a>.</p> <h3 id="io_uring-vs-liburing">io_uring vs liburing</h3><p>io_uring is a complex, low-level interface. Shuveb Hussain has <a href="https://unixism.net/2020/04/io-uring-by-example-part-1-introduction/">an excellent series</a> on programming io_uring. But that was too low-level for me as I was trying to figure out how to just get something working.</p> <p>Instead, most people use <a href="https://github.com/axboe/liburing">liburing</a> or a ported version of it like <a href="https://github.com/ziglang/zig/blob/master/lib/std/os/linux/io_uring.zig">the Zig standard library's io_uring.zig</a> or <a href="https://github.com/Iceber/iouring-go">Iceber's iouring-go</a>.</p> <p>io_uring started clicking for me when I tried out the iouring-go library. So we'll start there.</p> <h3 id="boilerplate">Boilerplate</h3><p>First off, let's set up some boilerplate for the Go and Zig code.</p> <p>In main.go add:</p> <div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span> <span class="kn">import</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="s">&quot;bytes&quot;</span> <span class="w"> </span><span class="s">&quot;fmt&quot;</span> <span class="w"> </span><span class="s">&quot;os&quot;</span> <span class="w"> </span><span class="s">&quot;time&quot;</span> <span class="p">)</span> <span class="kd">func</span><span class="w"> </span><span class="nx">assert</span><span class="p">(</span><span class="nx">b</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">b</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="s">&quot;assert&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> <span class="kd">const</span><span class="w"> </span><span class="nx">BUFFER_SIZE</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">4096</span> <span class="kd">func</span><span class="w"> </span><span class="nx">readNBytes</span><span class="p">(</span><span class="nx">fn</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">n</span><span class="w"> </span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">f</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Open</span><span class="p">(</span><span class="nx">fn</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">f</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span> <span class="w"> </span><span class="nx">data</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">make</span><span class="p">([]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">n</span><span class="p">)</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">buffer</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">make</span><span class="p">([]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="nx">BUFFER_SIZE</span><span class="p">)</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">data</span><span class="p">)</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nx">n</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">read</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">f</span><span class="p">.</span><span class="nx">Read</span><span class="p">(</span><span class="nx">buffer</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">data</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">data</span><span class="p">,</span><span class="w"> </span><span class="nx">buffer</span><span class="p">[:</span><span class="nx">read</span><span class="p">]</span><span class="o">...</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">assert</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">data</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">n</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">data</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">benchmark</span><span class="p">(</span><span class="nx">name</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">data</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="nx">fn</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="o">*</span><span class="nx">os</span><span class="p">.</span><span class="nx">File</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">&quot;%s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">name</span><span class="p">)</span> <span class="w"> </span><span class="nx">f</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">OpenFile</span><span class="p">(</span><span class="s">&quot;out.bin&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">O_RDWR</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">O_CREATE</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">O_TRUNC</span><span class="p">,</span><span class="w"> </span><span class="mo">0755</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">t1</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Now</span><span class="p">()</span> <span class="w"> </span><span class="nx">fn</span><span class="p">(</span><span class="nx">f</span><span class="p">)</span> <span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Now</span><span class="p">().</span><span class="nx">Sub</span><span class="p">(</span><span class="nx">t1</span><span class="p">).</span><span class="nx">Seconds</span><span class="p">()</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">&quot;,%f,%f\n&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="nb">float64</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">data</span><span class="p">))</span><span class="o">/</span><span class="nx">s</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">f</span><span class="p">.</span><span class="nx">Close</span><span class="p">();</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">assert</span><span class="p">(</span><span class="nx">bytes</span><span class="p">.</span><span class="nx">Equal</span><span class="p">(</span><span class="nx">readNBytes</span><span class="p">(</span><span class="s">&quot;out.bin&quot;</span><span class="p">,</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">data</span><span class="p">)),</span><span class="w"> </span><span class="nx">data</span><span class="p">))</span> <span class="p">}</span> </pre></div> <p>And in main.zig add:</p> <div class="highlight"><pre><span></span><span class="kr">const</span><span class="w"> </span><span class="n">std</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">&quot;std&quot;</span><span class="p">);</span> <span class="kr">const</span><span class="w"> </span><span class="n">OUT_FILE</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;out.bin&quot;</span><span class="p">;</span> <span class="kr">const</span><span class="w"> </span><span class="n">BUFFER_SIZE</span><span class="o">:</span><span class="w"> </span><span class="kt">u64</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">4096</span><span class="p">;</span> <span class="k">fn</span><span class="w"> </span><span class="n">readNBytes</span><span class="p">(</span> <span class="w"> </span><span class="n">allocator</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="kr">const</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">Allocator</span><span class="p">,</span> <span class="w"> </span><span class="n">filename</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span> <span class="w"> </span><span class="n">n</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="p">,</span> <span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">file</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">fs</span><span class="p">.</span><span class="n">cwd</span><span class="p">().</span><span class="n">openFile</span><span class="p">(</span><span class="n">filename</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">file</span><span class="p">.</span><span class="n">close</span><span class="p">();</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">allocator</span><span class="p">.</span><span class="n">alloc</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">n</span><span class="p">);</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">buf</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">allocator</span><span class="p">.</span><span class="n">alloc</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">BUFFER_SIZE</span><span class="p">);</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">written</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">data</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">n</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">nwritten</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">file</span><span class="p">.</span><span class="n">read</span><span class="p">(</span><span class="n">buf</span><span class="p">);</span> <span class="w"> </span><span class="nb">@memcpy</span><span class="p">(</span><span class="n">data</span><span class="p">[</span><span class="n">written</span><span class="p">..],</span><span class="w"> </span><span class="n">buf</span><span class="p">[</span><span class="mi">0</span><span class="p">..</span><span class="n">nwritten</span><span class="p">]);</span> <span class="w"> </span><span class="n">written</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">nwritten</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">assert</span><span class="p">(</span><span class="n">data</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">n</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">data</span><span class="p">;</span> <span class="p">}</span> <span class="kr">const</span><span class="w"> </span><span class="n">Benchmark</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">t</span><span class="o">:</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">time</span><span class="p">.</span><span class="n">Timer</span><span class="p">,</span> <span class="w"> </span><span class="n">file</span><span class="o">:</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">fs</span><span class="p">.</span><span class="n">File</span><span class="p">,</span> <span class="w"> </span><span class="n">data</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span> <span class="w"> </span><span class="n">allocator</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="kr">const</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">Allocator</span><span class="p">,</span> <span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">init</span><span class="p">(</span> <span class="w"> </span><span class="n">allocator</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="kr">const</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">Allocator</span><span class="p">,</span> <span class="w"> </span><span class="n">name</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span> <span class="w"> </span><span class="n">data</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span> <span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="n">Benchmark</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">io</span><span class="p">.</span><span class="n">getStdOut</span><span class="p">().</span><span class="n">writer</span><span class="p">().</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;{s}&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">name</span><span class="p">});</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">file</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">fs</span><span class="p">.</span><span class="n">cwd</span><span class="p">().</span><span class="n">createFile</span><span class="p">(</span><span class="n">OUT_FILE</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span> <span class="w"> </span><span class="p">.</span><span class="n">truncate</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span> <span class="w"> </span><span class="p">});</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">Benchmark</span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">t</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">time</span><span class="p">.</span><span class="n">Timer</span><span class="p">.</span><span class="n">start</span><span class="p">(),</span> <span class="w"> </span><span class="p">.</span><span class="n">file</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">file</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">allocator</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">allocator</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">stop</span><span class="p">(</span><span class="n">b</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">Benchmark</span><span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@as</span><span class="p">(</span><span class="kt">f64</span><span class="p">,</span><span class="w"> </span><span class="nb">@floatFromInt</span><span class="p">(</span><span class="n">b</span><span class="p">.</span><span class="n">t</span><span class="p">.</span><span class="n">read</span><span class="p">()))</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">time</span><span class="p">.</span><span class="n">ns_per_s</span><span class="p">;</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">io</span><span class="p">.</span><span class="n">getStdOut</span><span class="p">().</span><span class="n">writer</span><span class="p">().</span><span class="n">print</span><span class="p">(</span> <span class="w"> </span><span class="s">&quot;,{d},{d}</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span> <span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="n">s</span><span class="p">,</span><span class="w"> </span><span class="nb">@as</span><span class="p">(</span><span class="kt">f64</span><span class="p">,</span><span class="w"> </span><span class="nb">@floatFromInt</span><span class="p">(</span><span class="n">b</span><span class="p">.</span><span class="n">data</span><span class="p">.</span><span class="n">len</span><span class="p">))</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">unreachable</span><span class="p">;</span> <span class="w"> </span><span class="n">b</span><span class="p">.</span><span class="n">file</span><span class="p">.</span><span class="n">close</span><span class="p">();</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">in</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">readNBytes</span><span class="p">(</span><span class="n">b</span><span class="p">.</span><span class="n">allocator</span><span class="p">,</span><span class="w"> </span><span class="n">OUT_FILE</span><span class="p">,</span><span class="w"> </span><span class="n">b</span><span class="p">.</span><span class="n">data</span><span class="p">.</span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">unreachable</span><span class="p">;</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">assert</span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">in</span><span class="p">,</span><span class="w"> </span><span class="n">b</span><span class="p">.</span><span class="n">data</span><span class="p">));</span> <span class="w"> </span><span class="n">b</span><span class="p">.</span><span class="n">allocator</span><span class="p">.</span><span class="n">free</span><span class="p">(</span><span class="n">in</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="p">};</span> </pre></div> <h3 id="keep-it-simple:-write()">Keep it simple: write()</h3><p>Now let's add the naive version of writing bytes to disk: calling <code>write()</code> repeatedly until all data has been written to disk.</p> <p>In <code>main.go</code>:</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">size</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">104857600</span><span class="w"> </span><span class="c1">// 100MiB</span> <span class="w"> </span><span class="nx">data</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">readNBytes</span><span class="p">(</span><span class="s">&quot;/dev/random&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">size</span><span class="p">)</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">RUNS</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">10</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nx">RUNS</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">benchmark</span><span class="p">(</span><span class="s">&quot;blocking&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">data</span><span class="p">,</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="nx">f</span><span class="w"> </span><span class="o">*</span><span class="nx">os</span><span class="p">.</span><span class="nx">File</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">data</span><span class="p">);</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="nx">BUFFER_SIZE</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">size</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">min</span><span class="p">(</span><span class="nx">BUFFER_SIZE</span><span class="p">,</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">data</span><span class="p">)</span><span class="o">-</span><span class="nx">i</span><span class="p">)</span> <span class="w"> </span><span class="nx">n</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">f</span><span class="p">.</span><span class="nx">Write</span><span class="p">(</span><span class="nx">data</span><span class="p">[</span><span class="nx">i</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="nx">i</span><span class="o">+</span><span class="nx">size</span><span class="p">])</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">assert</span><span class="p">(</span><span class="nx">n</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">BUFFER_SIZE</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">})</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>And in <code>main.zig</code>:</p> <div class="highlight"><pre><span></span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">main</span><span class="p">()</span><span class="w"> </span><span class="o">!</span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">allocator</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">&amp;</span><span class="n">std</span><span class="p">.</span><span class="n">heap</span><span class="p">.</span><span class="n">page_allocator</span><span class="p">;</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">SIZE</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">104857600</span><span class="p">;</span><span class="w"> </span><span class="c1">// 100MiB</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">readNBytes</span><span class="p">(</span><span class="n">allocator</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;/dev/random&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">SIZE</span><span class="p">);</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">allocator</span><span class="p">.</span><span class="n">free</span><span class="p">(</span><span class="n">data</span><span class="p">);</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">RUNS</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">10</span><span class="p">;</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">run</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">run</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">RUNS</span><span class="p">)</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">(</span><span class="n">run</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">Benchmark</span><span class="p">.</span><span class="n">init</span><span class="p">(</span><span class="n">allocator</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;blocking&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="p">);</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">b</span><span class="p">.</span><span class="n">stop</span><span class="p">();</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">i</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">BUFFER_SIZE</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@min</span><span class="p">(</span><span class="n">BUFFER_SIZE</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">i</span><span class="p">);</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">b</span><span class="p">.</span><span class="n">file</span><span class="p">.</span><span class="n">write</span><span class="p">(</span><span class="n">data</span><span class="p">[</span><span class="n">i</span><span class="w"> </span><span class="p">..</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">size</span><span class="p">]);</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">assert</span><span class="p">(</span><span class="n">n</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">size</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>Let's build and run these programs and store the results to CSV we can analyze with DuckDB.</p> <p>Go first:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>build<span class="w"> </span>main.go<span class="w"> </span>-o<span class="w"> </span>gomain $<span class="w"> </span>./gomain<span class="w"> </span>&gt;<span class="w"> </span>go.csv $<span class="w"> </span>duckdb<span class="w"> </span>-c<span class="w"> </span><span class="s2">&quot;select column0 as method, avg(cast(column1 as double)) || &#39;s&#39; avg_time, format_bytes(avg(column2::double)::bigint) || &#39;/s&#39; as avg_throughput from &#39;go.csv&#39; group by column0 order by avg(cast(column1 as double)) asc&quot;</span> </pre></div> <table> <thead><tr> <th>method</th> <th>avg_time</th> <th>avg_throughput</th> </tr> </thead> <tbody> <tr> <td>blocking</td> <td>0.07251540000000001s</td> <td>1.4GB/s</td> </tr> </tbody> </table> <p>And Zig:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>zig<span class="w"> </span>build-exe<span class="w"> </span>main.zig $<span class="w"> </span>./main<span class="w"> </span>&gt;<span class="w"> </span>zig.csv $<span class="w"> </span>duckdb<span class="w"> </span>-c<span class="w"> </span><span class="s2">&quot;select column0 as method, avg(cast(column1 as double)) || &#39;s&#39; avg_time, format_bytes(avg(column2::double)::bigint) || &#39;/s&#39; as avg_throughput from &#39;zig.csv&#39; group by column0 order by avg(cast(column1 as double)) asc&quot;</span> </pre></div> <table> <thead><tr> <th>method</th> <th>avg_time</th> <th>avg_throughput</th> </tr> </thead> <tbody> <tr> <td>blocking</td> <td>0.0656907669s</td> <td>1.5GB/s</td> </tr> </tbody> </table> <p>Alright, we've got a baseline now and both language implementations are in the same ballpark.</p> <p>Let's add a simple io_uring version!</p> <h3 id="io_uring,-1-entry,-go">io_uring, 1 entry, Go</h3><p>The <a href="https://github.com/Iceber/iouring-go#quickstart">iouring-go</a> library has really excellent documentation for getting started.</p> <p>To keep it simple, we'll use io_uring with only 1 entry. Add the following to <code>func main()</code> after the existing <code>benchmark()</code> call in <code>main.go</code>:</p> <div class="highlight"><pre><span></span><span class="n">benchmark</span><span class="p">(</span><span class="s">&quot;io_uring&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="p">,</span><span class="w"> </span><span class="n">func</span><span class="p">(</span><span class="n">f</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">os</span><span class="p">.</span><span class="n">File</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">iour</span><span class="p">,</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="n">iouring</span><span class="p">.</span><span class="n">New</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nb">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">panic</span><span class="p">(</span><span class="n">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">defer</span><span class="w"> </span><span class="n">iour</span><span class="p">.</span><span class="n">Close</span><span class="p">()</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">len</span><span class="p">(</span><span class="n">data</span><span class="p">);</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">BUFFER_SIZE</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nl">size</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">min</span><span class="p">(</span><span class="n">BUFFER_SIZE</span><span class="p">,</span><span class="w"> </span><span class="n">len</span><span class="p">(</span><span class="n">data</span><span class="p">)</span><span class="o">-</span><span class="n">i</span><span class="p">)</span> <span class="w"> </span><span class="nl">prepRequest</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">iouring</span><span class="p">.</span><span class="n">Pwrite</span><span class="p">(</span><span class="kt">int</span><span class="p">(</span><span class="n">f</span><span class="p">.</span><span class="n">Fd</span><span class="p">()),</span><span class="w"> </span><span class="n">data</span><span class="p">[</span><span class="n">i</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="n">i</span><span class="o">+</span><span class="n">size</span><span class="p">],</span><span class="w"> </span><span class="n">uint64</span><span class="p">(</span><span class="n">i</span><span class="p">))</span> <span class="w"> </span><span class="n">res</span><span class="p">,</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="n">iour</span><span class="p">.</span><span class="n">SubmitRequest</span><span class="p">(</span><span class="n">prepRequest</span><span class="p">,</span><span class="w"> </span><span class="nb">nil</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nb">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">panic</span><span class="p">(</span><span class="n">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="o">&lt;-</span><span class="n">res</span><span class="p">.</span><span class="n">Done</span><span class="p">()</span> <span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="n">res</span><span class="p">.</span><span class="n">ReturnInt</span><span class="p">()</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nb">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">panic</span><span class="p">(</span><span class="n">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">assert</span><span class="p">(</span><span class="n">size</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">i</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="p">})</span> </pre></div> <p>Note that <code>benchmark</code> takes care of <code>f.Seek(0)</code> before each run. And it also validates that the file contents are equivalent to the input <code>data</code>. So it validates the benchmark for correctness.</p> <p>Alright, let's run this new Go implementation with io_uring!</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>mod<span class="w"> </span>init<span class="w"> </span>gomain $<span class="w"> </span>go<span class="w"> </span>mod<span class="w"> </span>tidy $<span class="w"> </span>go<span class="w"> </span>build<span class="w"> </span>main.go<span class="w"> </span>-o<span class="w"> </span>gomain $<span class="w"> </span>./gomain<span class="w"> </span>&gt;<span class="w"> </span>go.csv $<span class="w"> </span>duckdb<span class="w"> </span>-c<span class="w"> </span><span class="s2">&quot;select column0 as method, avg(cast(column1 as double)) || &#39;s&#39; avg_time, format_bytes(avg(column2::double)::bigint) || &#39;/s&#39; as avg_throughput from &#39;go.csv&#39; group by column0 order by avg(cast(column1 as double)) asc&quot;</span> </pre></div> <table> <thead><tr> <th>method</th> <th>avg_time</th> <th>avg_throughput</th> </tr> </thead> <tbody> <tr> <td>blocking</td> <td>0.0811486s</td> <td>1.3GB/s</td> </tr> <tr> <td>io_uring</td> <td>0.5083049999999999s</td> <td>213.2MB/s</td> </tr> </tbody> </table> <p>Well that looks terrible.</p> <p>Let's port it to Zig to see if we notice the same behavior there.</p> <h3 id="io_uring,-1-entry,-zig">io_uring, 1 entry, Zig</h3><p>There isn't an official Zig tutorial on io_uring I'm aware of. But <a href="https://github.com/ziglang/zig/blob/master/lib/std/os/linux/io_uring.zig">io_uring.zig</a> is easy enough to browse through. And there are tests in that file that also show how to use it.</p> <p>And now that we've explored a bit in Go the basic gist should be similar:</p> <ul> <li>initialize io_uring</li> <li>submit an entry</li> <li>wait for it to finish</li> <li>move on</li> </ul> <p>Add the following to <code>fn main()</code> after the existing benchmark block in <code>main.zig</code>:</p> <div class="highlight"><pre><span></span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">Benchmark</span><span class="p">.</span><span class="n">init</span><span class="p">(</span><span class="n">allocator</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;iouring&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="p">);</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">b</span><span class="p">.</span><span class="n">stop</span><span class="p">();</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">entries</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">ring</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">os</span><span class="p">.</span><span class="n">linux</span><span class="p">.</span><span class="n">IO_Uring</span><span class="p">.</span><span class="n">init</span><span class="p">(</span><span class="n">entries</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">ring</span><span class="p">.</span><span class="n">deinit</span><span class="p">();</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">i</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">BUFFER_SIZE</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@min</span><span class="p">(</span><span class="n">BUFFER_SIZE</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">i</span><span class="p">);</span> <span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">ring</span><span class="p">.</span><span class="n">write</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">b</span><span class="p">.</span><span class="n">file</span><span class="p">.</span><span class="n">handle</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="p">[</span><span class="n">i</span><span class="w"> </span><span class="p">..</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">size</span><span class="p">],</span><span class="w"> </span><span class="n">i</span><span class="p">);</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">submitted</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">ring</span><span class="p">.</span><span class="n">submit_and_wait</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">assert</span><span class="p">(</span><span class="n">submitted</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">cqe</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">ring</span><span class="p">.</span><span class="n">copy_cqe</span><span class="p">();</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">assert</span><span class="p">(</span><span class="n">cqe</span><span class="p">.</span><span class="n">err</span><span class="p">()</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="p">.</span><span class="n">SUCCESS</span><span class="p">);</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">assert</span><span class="p">(</span><span class="n">cqe</span><span class="p">.</span><span class="n">res</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@as</span><span class="p">(</span><span class="kt">usize</span><span class="p">,</span><span class="w"> </span><span class="nb">@intCast</span><span class="p">(</span><span class="n">cqe</span><span class="p">.</span><span class="n">res</span><span class="p">));</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">assert</span><span class="p">(</span><span class="n">n</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="n">BUFFER_SIZE</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>Now build and run:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>zig<span class="w"> </span>build-exe<span class="w"> </span>main.zig $<span class="w"> </span>./main<span class="w"> </span>&gt;<span class="w"> </span>zig.csv $<span class="w"> </span>duckdb<span class="w"> </span>-c<span class="w"> </span><span class="s2">&quot;select column0 as method, avg(cast(column1 as double)) || &#39;s&#39; avg_time, format_bytes(avg(column2::double)::bigint) || &#39;/s&#39; as avg_throughput from &#39;zig.csv&#39; group by column0 order by avg(cast(column1 as double)) asc&quot;</span> </pre></div> <table> <thead><tr> <th>method</th> <th>avg_time</th> <th>avg_throughput</th> </tr> </thead> <tbody> <tr> <td>blocking</td> <td>0.06650093630000001s</td> <td>1.5GB/s</td> </tr> <tr> <td>io_uring</td> <td>0.17542890139999998s</td> <td>597.7MB/s</td> </tr> </tbody> </table> <p>Well it's similarly pretty bad. But our implementation ignores one major aspect of io_uring: batching requests.</p> <p>Let's do some refactoring!</p> <h3 id="io_uring,-n-entries,-go">io_uring, N entries, Go</h3><p>To support submitting N entries, we're going to have an inner loop running up to N that fills up N entries to io_uring.</p> <p>Then we'll wait for the N submissions to complete and check their results.</p> <p>We'll keep going until we write the entire file.</p> <p>All of this can stay inside the loop in <code>main</code>, I'm just dropping preceding whitespace for nicer formatting here:</p> <div class="highlight"><pre><span></span><span class="nx">benchmarkIOUringNEntries</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">nEntries</span><span class="w"> </span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">benchmark</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">&quot;io_uring_%d_entries&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">nEntries</span><span class="p">),</span><span class="w"> </span><span class="nx">data</span><span class="p">,</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="nx">f</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">File</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">iour</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">iouring</span><span class="p">.</span><span class="nx">New</span><span class="p">(</span><span class="nb">uint</span><span class="p">(</span><span class="nx">nEntries</span><span class="p">))</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">iour</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span> <span class="w"> </span><span class="nx">requests</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">make</span><span class="p">([]</span><span class="nx">iouring</span><span class="p">.</span><span class="nx">PrepRequest</span><span class="p">,</span><span class="w"> </span><span class="nx">nEntries</span><span class="p">)</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">data</span><span class="p">);</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="nx">BUFFER_SIZE</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nx">nEntries</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">submittedEntries</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">j</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="nx">j</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nx">nEntries</span><span class="p">;</span><span class="w"> </span><span class="nx">j</span><span class="o">++</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">base</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">j</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nx">BUFFER_SIZE</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">base</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">data</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">break</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">submittedEntries</span><span class="o">++</span> <span class="w"> </span><span class="nx">size</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">min</span><span class="p">(</span><span class="nx">BUFFER_SIZE</span><span class="p">,</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">data</span><span class="p">)</span><span class="o">-</span><span class="nx">i</span><span class="p">)</span> <span class="w"> </span><span class="nx">requests</span><span class="p">[</span><span class="nx">j</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">iouring</span><span class="p">.</span><span class="nx">Pwrite</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="nx">f</span><span class="p">.</span><span class="nx">Fd</span><span class="p">()),</span><span class="w"> </span><span class="nx">data</span><span class="p">[</span><span class="nx">base</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="nx">base</span><span class="o">+</span><span class="nx">size</span><span class="p">],</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nx">base</span><span class="p">))</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">submittedEntries</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">break</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">iour</span><span class="p">.</span><span class="nx">SubmitRequests</span><span class="p">(</span><span class="nx">requests</span><span class="p">[:</span><span class="nx">submittedEntries</span><span class="p">],</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="o">&lt;-</span><span class="nx">res</span><span class="p">.</span><span class="nx">Done</span><span class="p">()</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">result</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">res</span><span class="p">.</span><span class="nx">ErrResults</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">result</span><span class="p">.</span><span class="nx">ReturnInt</span><span class="p">()</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">})</span> <span class="p">}</span> <span class="nx">benchmarkIOUringNEntries</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="nx">benchmarkIOUringNEntries</span><span class="p">(</span><span class="mi">128</span><span class="p">)</span> </pre></div> <p>There are some specific things in there to notice.</p> <p>First, toward the end of the file we may not have <code>N</code> entries to submit. We may have <code>1</code> or even <code>0</code>.</p> <p>If we have <code>0</code> to submit, we need to not even submit anything otherwise the Go library hangs. Similarly, if we don't slice <code>requests</code> to <code>requests[:submittedEntries]</code>, the Go library will segfault if <code>submittedEntries &lt; N</code>.</p> <p>Other than that, let's build and run this!</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>build<span class="w"> </span>-o<span class="w"> </span>gomain $<span class="w"> </span>./gomain<span class="w"> </span>&gt;<span class="w"> </span>go.csv $<span class="w"> </span>duckdb<span class="w"> </span>-c<span class="w"> </span><span class="s2">&quot;select column0 as method, avg(cast(column1 as double)) || &#39;s&#39; avg_time, format_bytes(avg(column2::double)::bigint) || &#39;/s&#39; as avg_throughput from &#39;go.csv&#39; group by column0 order by avg(cast(column1 as double)) asc&quot;</span> </pre></div> <table> <thead><tr> <th>method</th> <th>avg_time</th> <th>avg_throughput</th> </tr> </thead> <tbody> <tr> <td>blocking</td> <td>0.0740368s</td> <td>1.4GB/s</td> </tr> <tr> <td>io_uring_128_entries</td> <td>0.127519s</td> <td>836.6MB/s</td> </tr> <tr> <td>io_uring_1_entries</td> <td>0.46831579999999995s</td> <td>226.9MB/s</td> </tr> </tbody> </table> <p>Now we're getting somewhere! Still half the throughput but a 4x improvement from using only a single entry.</p> <p>Let's port the N entry code to Zig.</p> <h3 id="io_uring,-n-entries,-zig">io_uring, N entries, Zig</h3><p>Unlike Go we can't do closures, so we'll have to make <code>benchmarkIOUringNEntries</code> a top-level function and keep the calls to it in the loop in <code>main</code>:</p> <div class="highlight"><pre><span></span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">main</span><span class="p">()</span><span class="w"> </span><span class="o">!</span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">allocator</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">&amp;</span><span class="n">std</span><span class="p">.</span><span class="n">heap</span><span class="p">.</span><span class="n">page_allocator</span><span class="p">;</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">SIZE</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">104857600</span><span class="p">;</span><span class="w"> </span><span class="c1">// 100MiB</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">readNBytes</span><span class="p">(</span><span class="n">allocator</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;/dev/random&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">SIZE</span><span class="p">);</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">allocator</span><span class="p">.</span><span class="n">free</span><span class="p">(</span><span class="n">data</span><span class="p">);</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">RUNS</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">10</span><span class="p">;</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">run</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">run</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">RUNS</span><span class="p">)</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">(</span><span class="n">run</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">Benchmark</span><span class="p">.</span><span class="n">init</span><span class="p">(</span><span class="n">allocator</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;blocking&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="p">);</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">b</span><span class="p">.</span><span class="n">stop</span><span class="p">();</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">i</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">BUFFER_SIZE</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@min</span><span class="p">(</span><span class="n">BUFFER_SIZE</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">i</span><span class="p">);</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">b</span><span class="p">.</span><span class="n">file</span><span class="p">.</span><span class="n">write</span><span class="p">(</span><span class="n">data</span><span class="p">[</span><span class="n">i</span><span class="w"> </span><span class="p">..</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">size</span><span class="p">]);</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">assert</span><span class="p">(</span><span class="n">n</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">size</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">benchmarkIOUringNEntries</span><span class="p">(</span><span class="n">allocator</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span> <span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">benchmarkIOUringNEntries</span><span class="p">(</span><span class="n">allocator</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="p">,</span><span class="w"> </span><span class="mi">128</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>And for the implementation itself, the only two big differences from the first version are that we'll bulk-read completion events (<code>cqe</code>s) and that we'll create and wait for many submissions at once.</p> <div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">benchmarkIOUringNEntries</span><span class="p">(</span> <span class="w"> </span><span class="n">allocator</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="kr">const</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">Allocator</span><span class="p">,</span> <span class="w"> </span><span class="n">data</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span> <span class="w"> </span><span class="n">nEntries</span><span class="o">:</span><span class="w"> </span><span class="n">u13</span><span class="p">,</span> <span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">fmt</span><span class="p">.</span><span class="n">allocPrint</span><span class="p">(</span><span class="n">allocator</span><span class="p">.</span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;iouring_{}_entries&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">nEntries</span><span class="p">});</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">allocator</span><span class="p">.</span><span class="n">free</span><span class="p">(</span><span class="n">name</span><span class="p">);</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">Benchmark</span><span class="p">.</span><span class="n">init</span><span class="p">(</span><span class="n">allocator</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="p">);</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">b</span><span class="p">.</span><span class="n">stop</span><span class="p">();</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">ring</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">os</span><span class="p">.</span><span class="n">linux</span><span class="p">.</span><span class="n">IO_Uring</span><span class="p">.</span><span class="n">init</span><span class="p">(</span><span class="n">nEntries</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">ring</span><span class="p">.</span><span class="n">deinit</span><span class="p">();</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">cqes</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">allocator</span><span class="p">.</span><span class="n">alloc</span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">os</span><span class="p">.</span><span class="n">linux</span><span class="p">.</span><span class="n">io_uring_cqe</span><span class="p">,</span><span class="w"> </span><span class="n">nEntries</span><span class="p">);</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">allocator</span><span class="p">.</span><span class="n">free</span><span class="p">(</span><span class="n">cqes</span><span class="p">);</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">i</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">BUFFER_SIZE</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">nEntries</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">submittedEntries</span><span class="o">:</span><span class="w"> </span><span class="kt">u32</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">j</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">j</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">nEntries</span><span class="p">)</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">(</span><span class="n">j</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">base</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">j</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">BUFFER_SIZE</span><span class="p">;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">base</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">submittedEntries</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@min</span><span class="p">(</span><span class="n">BUFFER_SIZE</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">base</span><span class="p">);</span> <span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">ring</span><span class="p">.</span><span class="n">write</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">b</span><span class="p">.</span><span class="n">file</span><span class="p">.</span><span class="n">handle</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="p">[</span><span class="n">base</span><span class="w"> </span><span class="p">..</span><span class="w"> </span><span class="n">base</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">size</span><span class="p">],</span><span class="w"> </span><span class="n">base</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">submitted</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">ring</span><span class="p">.</span><span class="n">submit_and_wait</span><span class="p">(</span><span class="n">submittedEntries</span><span class="p">);</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">assert</span><span class="p">(</span><span class="n">submitted</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">submittedEntries</span><span class="p">);</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">waited</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">ring</span><span class="p">.</span><span class="n">copy_cqes</span><span class="p">(</span><span class="n">cqes</span><span class="p">[</span><span class="mi">0</span><span class="p">..</span><span class="n">submitted</span><span class="p">],</span><span class="w"> </span><span class="n">submitted</span><span class="p">);</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">assert</span><span class="p">(</span><span class="n">waited</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">submitted</span><span class="p">);</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">cqes</span><span class="p">[</span><span class="mi">0</span><span class="p">..</span><span class="n">submitted</span><span class="p">])</span><span class="w"> </span><span class="o">|*</span><span class="n">cqe</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">assert</span><span class="p">(</span><span class="n">cqe</span><span class="p">.</span><span class="n">err</span><span class="p">()</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="p">.</span><span class="n">SUCCESS</span><span class="p">);</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">assert</span><span class="p">(</span><span class="n">cqe</span><span class="p">.</span><span class="n">res</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@as</span><span class="p">(</span><span class="kt">usize</span><span class="p">,</span><span class="w"> </span><span class="nb">@intCast</span><span class="p">(</span><span class="n">cqe</span><span class="p">.</span><span class="n">res</span><span class="p">));</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">assert</span><span class="p">(</span><span class="n">n</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="n">BUFFER_SIZE</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>Let's build and run:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>zig<span class="w"> </span>build-exe<span class="w"> </span>main.zig $<span class="w"> </span>./main<span class="w"> </span>&gt;<span class="w"> </span>zig.csv $<span class="w"> </span>duckdb<span class="w"> </span>-c<span class="w"> </span><span class="s2">&quot;select column0 as method, avg(cast(column1 as double)) || &#39;s&#39; avg_time, format_bytes(avg(column2::double)::bigint) || &#39;/s&#39; as avg_throughput from &#39;zig.csv&#39; group by column0 order by avg(cast(column1 as double)) asc&quot;</span> </pre></div> <table> <thead><tr> <th>method</th> <th>avg_time</th> <th>avg_throughput</th> </tr> </thead> <tbody> <tr> <td>blocking</td> <td>0.0674331114s</td> <td>1.5GB/s</td> </tr> <tr> <td>iouring_128_entries</td> <td>0.06773539590000001s</td> <td>1.5GB/s</td> </tr> <tr> <td>iouring_1_entries</td> <td>0.1855542556s</td> <td>569.9MB/s</td> </tr> </tbody> </table> <p>Huh, that's surprising! We caught up to blocking writes with io_uring in Zig, but not in Go, even though we made good progress in Go.</p> <h3 id="ring-buffers">Ring buffers</h3><p>But we can do a bit better. We're doing batching, but the API is called "io_uring" not "io_batch". We're not even making use of the ring buffer behavior io_uring gives us!</p> <p>We are waiting for all submitted results complete. But there's no reason to do that. Instead we should submit as much as we can. But we should not block waiting for completions. We should handle completions when they happen. And we should retry submissions until we're done reading. Retrying if there's no space for the moment.</p> <p>Unfortunately the Go library doesn't seem to expose this ring behavior of io_uring. Or I've missed it.</p> <p>But we can do it in Zig. Let's go.</p> <h3 id="io_uring,-ring-buffer,-zig">io_uring, ring buffer, Zig</h3><p>We need to change the way we track which offsets we need to submit so far. We also need to keep the loop going until we are sure we have <em>written</em> all data. And we need to stop blocking on the number we submitted; never blocking at all.</p> <div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">benchmarkIOUringNEntries</span><span class="p">(</span> <span class="w"> </span><span class="n">allocator</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="kr">const</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">Allocator</span><span class="p">,</span> <span class="w"> </span><span class="n">data</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span> <span class="w"> </span><span class="n">nEntries</span><span class="o">:</span><span class="w"> </span><span class="n">u13</span><span class="p">,</span> <span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">fmt</span><span class="p">.</span><span class="n">allocPrint</span><span class="p">(</span><span class="n">allocator</span><span class="p">.</span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;iouring_{}_entries&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">nEntries</span><span class="p">});</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">allocator</span><span class="p">.</span><span class="n">free</span><span class="p">(</span><span class="n">name</span><span class="p">);</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">Benchmark</span><span class="p">.</span><span class="n">init</span><span class="p">(</span><span class="n">allocator</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="p">);</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">b</span><span class="p">.</span><span class="n">stop</span><span class="p">();</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">ring</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">os</span><span class="p">.</span><span class="n">linux</span><span class="p">.</span><span class="n">IO_Uring</span><span class="p">.</span><span class="n">init</span><span class="p">(</span><span class="n">nEntries</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">ring</span><span class="p">.</span><span class="n">deinit</span><span class="p">();</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">cqes</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">allocator</span><span class="p">.</span><span class="n">alloc</span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">os</span><span class="p">.</span><span class="n">linux</span><span class="p">.</span><span class="n">io_uring_cqe</span><span class="p">,</span><span class="w"> </span><span class="n">nEntries</span><span class="p">);</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">allocator</span><span class="p">.</span><span class="n">free</span><span class="p">(</span><span class="n">cqes</span><span class="p">);</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">written</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">i</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="k">or</span><span class="w"> </span><span class="n">written</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">submittedEntries</span><span class="o">:</span><span class="w"> </span><span class="kt">u32</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">j</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="kc">true</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">base</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">j</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">BUFFER_SIZE</span><span class="p">;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">base</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@min</span><span class="p">(</span><span class="n">BUFFER_SIZE</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">base</span><span class="p">);</span> <span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ring</span><span class="p">.</span><span class="n">write</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">b</span><span class="p">.</span><span class="n">file</span><span class="p">.</span><span class="n">handle</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="p">[</span><span class="n">base</span><span class="w"> </span><span class="p">..</span><span class="w"> </span><span class="n">base</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">size</span><span class="p">],</span><span class="w"> </span><span class="n">base</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="o">|</span><span class="n">e</span><span class="o">|</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">e</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">error</span><span class="p">.</span><span class="n">SubmissionQueueFull</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="k">break</span><span class="p">,</span> <span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="k">unreachable</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="n">submittedEntries</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">size</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">ring</span><span class="p">.</span><span class="n">submit_and_wait</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">cqesDone</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">ring</span><span class="p">.</span><span class="n">copy_cqes</span><span class="p">(</span><span class="n">cqes</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">cqes</span><span class="p">[</span><span class="mi">0</span><span class="p">..</span><span class="n">cqesDone</span><span class="p">])</span><span class="w"> </span><span class="o">|*</span><span class="n">cqe</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">assert</span><span class="p">(</span><span class="n">cqe</span><span class="p">.</span><span class="n">err</span><span class="p">()</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="p">.</span><span class="n">SUCCESS</span><span class="p">);</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">assert</span><span class="p">(</span><span class="n">cqe</span><span class="p">.</span><span class="n">res</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@as</span><span class="p">(</span><span class="kt">usize</span><span class="p">,</span><span class="w"> </span><span class="nb">@intCast</span><span class="p">(</span><span class="n">cqe</span><span class="p">.</span><span class="n">res</span><span class="p">));</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">assert</span><span class="p">(</span><span class="n">n</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="n">BUFFER_SIZE</span><span class="p">);</span> <span class="w"> </span><span class="n">written</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">n</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>The code got a bit simpler! Granted, we're omitting error handling.</p> <p>Build and run:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>zig<span class="w"> </span>build-exe<span class="w"> </span>main.zig $<span class="w"> </span>./main<span class="w"> </span>&gt;<span class="w"> </span>zig.csv $<span class="w"> </span>duckdb<span class="w"> </span>-c<span class="w"> </span><span class="s2">&quot;select column0 as method, avg(cast(column1 as double)) || &#39;s&#39; avg_time, format_bytes(avg(column2::double)::bigint) || &#39;/s&#39; as avg_throughput from &#39;zig.csv&#39; group by column0 order by avg(cast(column1 as double)) asc&quot;</span> </pre></div> <table> <thead><tr> <th>method</th> <th>avg_time</th> <th>avg_throughput</th> </tr> </thead> <tbody> <tr> <td>iouring_128_entries</td> <td>0.06035423609999999s</td> <td>1.7GB/s</td> </tr> <tr> <td>iouring_1_entries</td> <td>0.0610197624s</td> <td>1.7GB/s</td> </tr> <tr> <td>blocking</td> <td>0.0671628515s</td> <td>1.5GB/s</td> </tr> </tbody> </table> <p>Not bad!</p> <h3 id="crank-it-up">Crank it up</h3><p>We've been inserting 100MiB of data. Let's go up to 1GiB to see how that affects things. Ideally the more data we write the more we reflect realistic long-term results.</p> <p>In <code>main.zig</code> just change <code>SIZE</code> to <code>1073741824</code>. Rebuild and run:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>zig<span class="w"> </span>build-exe<span class="w"> </span>main.zig $<span class="w"> </span>./main<span class="w"> </span>&gt;<span class="w"> </span>zig.csv $<span class="w"> </span>duckdb<span class="w"> </span>-c<span class="w"> </span><span class="s2">&quot;select column0 as method, avg(cast(column1 as double)) || &#39;s&#39; avg_time, format_bytes(avg(column2::double)::bigint) || &#39;/s&#39; as avg_throughput from &#39;out.csv&#39; group by column0 order by avg(cast(column1 as double)) asc&quot;</span> </pre></div> <table> <thead><tr> <th>method</th> <th>avg_time</th> <th>avg_throughput</th> </tr> </thead> <tbody> <tr> <td>iouring_128_entries</td> <td>0.6063814535s</td> <td>1.7GB/s</td> </tr> <tr> <td>iouring_1_entries</td> <td>0.6167537295000001s</td> <td>1.7GB/s</td> </tr> <tr> <td>blocking</td> <td>0.6831747749s</td> <td>1.5GB/s</td> </tr> </tbody> </table> <p>No real difference, perfect!</p> <p>Let's make one more change though. Let's up the <code>BUFFER_SIZE</code> from 4KiB to 1MiB.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>zig<span class="w"> </span>build-exe<span class="w"> </span>main.zig $<span class="w"> </span>./main<span class="w"> </span>&gt;<span class="w"> </span>zig.csv $<span class="w"> </span>duckdb<span class="w"> </span>-c<span class="w"> </span><span class="s2">&quot;select column0 as method, avg(cast(column1 as double)) || &#39;s&#39; avg_time, format_bytes(avg(column2::double)::bigint) || &#39;/s&#39; as avg_throughput from &#39;out.csv&#39; group by column0 order by avg(cast(column1 as double)) asc&quot;</span> </pre></div> <table> <thead><tr> <th>method</th> <th>avg_time</th> <th>avg_throughput</th> </tr> </thead> <tbody> <tr> <td>iouring_128_entries</td> <td>0.2756831357s</td> <td>3.8GB/s</td> </tr> <tr> <td>iouring_1_entries</td> <td>0.27575404880000004s</td> <td>3.8GB/s</td> </tr> <tr> <td>blocking</td> <td>0.2833337046s</td> <td>3.7GB/s</td> </tr> </tbody> </table> <p>Hey that's an improvement!</p> <h3 id="control">Control</h3><p>All these numbers are machine-specific obviously. So what does an existing tool like <a href="https://fio.readthedocs.io/en/latest/fio_doc.html">fio</a> say? (Assuming I'm using it correctly. I await your corrections!)</p> <p>With a 4KiB buffer size:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>fio<span class="w"> </span>--name<span class="o">=</span>fiotest<span class="w"> </span>--rw<span class="o">=</span>write<span class="w"> </span>--size<span class="o">=</span>1G<span class="w"> </span>--bs<span class="o">=</span>4k<span class="w"> </span>--group_reporting<span class="w"> </span>--ioengine<span class="o">=</span>sync fiotest:<span class="w"> </span><span class="o">(</span><span class="nv">g</span><span class="o">=</span><span class="m">0</span><span class="o">)</span>:<span class="w"> </span><span class="nv">rw</span><span class="o">=</span>write,<span class="w"> </span><span class="nv">bs</span><span class="o">=(</span>R<span class="o">)</span><span class="w"> </span>4096B-4096B,<span class="w"> </span><span class="o">(</span>W<span class="o">)</span><span class="w"> </span>4096B-4096B,<span class="w"> </span><span class="o">(</span>T<span class="o">)</span><span class="w"> </span>4096B-4096B,<span class="w"> </span><span class="nv">ioengine</span><span class="o">=</span>sync,<span class="w"> </span><span class="nv">iodepth</span><span class="o">=</span><span class="m">1</span> fio-3.33 Starting<span class="w"> </span><span class="m">1</span><span class="w"> </span>process Jobs:<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="o">(</span><span class="nv">f</span><span class="o">=</span><span class="m">1</span><span class="o">)</span> fiotest:<span class="w"> </span><span class="o">(</span><span class="nv">groupid</span><span class="o">=</span><span class="m">0</span>,<span class="w"> </span><span class="nv">jobs</span><span class="o">=</span><span class="m">1</span><span class="o">)</span>:<span class="w"> </span><span class="nv">err</span><span class="o">=</span><span class="w"> </span><span class="m">0</span>:<span class="w"> </span><span class="nv">pid</span><span class="o">=</span><span class="m">2437359</span>:<span class="w"> </span>Thu<span class="w"> </span>Oct<span class="w"> </span><span class="m">19</span><span class="w"> </span><span class="m">23</span>:33:42<span class="w"> </span><span class="m">2023</span> <span class="w"> </span>write:<span class="w"> </span><span class="nv">IOPS</span><span class="o">=</span>282k,<span class="w"> </span><span class="nv">BW</span><span class="o">=</span>1102MiB/s<span class="w"> </span><span class="o">(</span>1156MB/s<span class="o">)(</span>1024MiB/929msec<span class="o">)</span><span class="p">;</span><span class="w"> </span><span class="m">0</span><span class="w"> </span>zone<span class="w"> </span>resets <span class="w"> </span>clat<span class="w"> </span><span class="o">(</span>nsec<span class="o">)</span>:<span class="w"> </span><span class="nv">min</span><span class="o">=</span><span class="m">2349</span>,<span class="w"> </span><span class="nv">max</span><span class="o">=</span><span class="m">54099</span>,<span class="w"> </span><span class="nv">avg</span><span class="o">=</span><span class="m">2709</span>.48,<span class="w"> </span><span class="nv">stdev</span><span class="o">=</span><span class="m">1325</span>.83 <span class="w"> </span>lat<span class="w"> </span><span class="o">(</span>nsec<span class="o">)</span>:<span class="w"> </span><span class="nv">min</span><span class="o">=</span><span class="m">2390</span>,<span class="w"> </span><span class="nv">max</span><span class="o">=</span><span class="m">54139</span>,<span class="w"> </span><span class="nv">avg</span><span class="o">=</span><span class="m">2752</span>.89,<span class="w"> </span><span class="nv">stdev</span><span class="o">=</span><span class="m">1334</span>.62 <span class="w"> </span>clat<span class="w"> </span>percentiles<span class="w"> </span><span class="o">(</span>nsec<span class="o">)</span>: <span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">1</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">2416</span><span class="o">]</span>,<span class="w"> </span><span class="m">5</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">2416</span><span class="o">]</span>,<span class="w"> </span><span class="m">10</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">2416</span><span class="o">]</span>,<span class="w"> </span><span class="m">20</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">2448</span><span class="o">]</span>, <span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">30</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">2448</span><span class="o">]</span>,<span class="w"> </span><span class="m">40</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">2448</span><span class="o">]</span>,<span class="w"> </span><span class="m">50</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">2448</span><span class="o">]</span>,<span class="w"> </span><span class="m">60</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">2480</span><span class="o">]</span>, <span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">70</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">2512</span><span class="o">]</span>,<span class="w"> </span><span class="m">80</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">2544</span><span class="o">]</span>,<span class="w"> </span><span class="m">90</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">2832</span><span class="o">]</span>,<span class="w"> </span><span class="m">95</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">3504</span><span class="o">]</span>, <span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">99</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">5792</span><span class="o">]</span>,<span class="w"> </span><span class="m">99</span>.50th<span class="o">=[</span><span class="m">15296</span><span class="o">]</span>,<span class="w"> </span><span class="m">99</span>.90th<span class="o">=[</span><span class="m">19584</span><span class="o">]</span>,<span class="w"> </span><span class="m">99</span>.95th<span class="o">=[</span><span class="m">20096</span><span class="o">]</span>, <span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">99</span>.99th<span class="o">=[</span><span class="m">22656</span><span class="o">]</span> <span class="w"> </span>bw<span class="w"> </span><span class="o">(</span><span class="w"> </span>KiB/s<span class="o">)</span>:<span class="w"> </span><span class="nv">min</span><span class="o">=</span><span class="m">940856</span>,<span class="w"> </span><span class="nv">max</span><span class="o">=</span><span class="m">940856</span>,<span class="w"> </span><span class="nv">per</span><span class="o">=</span><span class="m">83</span>.36%,<span class="w"> </span><span class="nv">avg</span><span class="o">=</span><span class="m">940856</span>.00,<span class="w"> </span><span class="nv">stdev</span><span class="o">=</span><span class="w"> </span><span class="m">0</span>.00,<span class="w"> </span><span class="nv">samples</span><span class="o">=</span><span class="m">1</span> <span class="w"> </span>iops<span class="w"> </span>:<span class="w"> </span><span class="nv">min</span><span class="o">=</span><span class="m">235214</span>,<span class="w"> </span><span class="nv">max</span><span class="o">=</span><span class="m">235214</span>,<span class="w"> </span><span class="nv">avg</span><span class="o">=</span><span class="m">235214</span>.00,<span class="w"> </span><span class="nv">stdev</span><span class="o">=</span><span class="w"> </span><span class="m">0</span>.00,<span class="w"> </span><span class="nv">samples</span><span class="o">=</span><span class="m">1</span> <span class="w"> </span>lat<span class="w"> </span><span class="o">(</span>usec<span class="o">)</span><span class="w"> </span>:<span class="w"> </span><span class="nv">4</span><span class="o">=</span><span class="m">97</span>.22%,<span class="w"> </span><span class="nv">10</span><span class="o">=</span><span class="m">2</span>.03%,<span class="w"> </span><span class="nv">20</span><span class="o">=</span><span class="m">0</span>.71%,<span class="w"> </span><span class="nv">50</span><span class="o">=</span><span class="m">0</span>.04%,<span class="w"> </span><span class="nv">100</span><span class="o">=</span><span class="m">0</span>.01% <span class="w"> </span>cpu<span class="w"> </span>:<span class="w"> </span><span class="nv">usr</span><span class="o">=</span><span class="m">17</span>.35%,<span class="w"> </span><span class="nv">sys</span><span class="o">=</span><span class="m">82</span>.11%,<span class="w"> </span><span class="nv">ctx</span><span class="o">=</span><span class="m">26</span>,<span class="w"> </span><span class="nv">majf</span><span class="o">=</span><span class="m">0</span>,<span class="w"> </span><span class="nv">minf</span><span class="o">=</span><span class="m">11</span> <span class="w"> </span>IO<span class="w"> </span>depths<span class="w"> </span>:<span class="w"> </span><span class="nv">1</span><span class="o">=</span><span class="m">100</span>.0%,<span class="w"> </span><span class="nv">2</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">4</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">8</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">16</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">32</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span>&gt;<span class="o">=</span><span class="nv">64</span><span class="o">=</span><span class="m">0</span>.0% <span class="w"> </span>submit<span class="w"> </span>:<span class="w"> </span><span class="nv">0</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">4</span><span class="o">=</span><span class="m">100</span>.0%,<span class="w"> </span><span class="nv">8</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">16</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">32</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">64</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span>&gt;<span class="o">=</span><span class="nv">64</span><span class="o">=</span><span class="m">0</span>.0% <span class="w"> </span><span class="nb">complete</span><span class="w"> </span>:<span class="w"> </span><span class="nv">0</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">4</span><span class="o">=</span><span class="m">100</span>.0%,<span class="w"> </span><span class="nv">8</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">16</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">32</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">64</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span>&gt;<span class="o">=</span><span class="nv">64</span><span class="o">=</span><span class="m">0</span>.0% <span class="w"> </span>issued<span class="w"> </span>rwts:<span class="w"> </span><span class="nv">total</span><span class="o">=</span><span class="m">0</span>,262144,0,0<span class="w"> </span><span class="nv">short</span><span class="o">=</span><span class="m">0</span>,0,0,0<span class="w"> </span><span class="nv">dropped</span><span class="o">=</span><span class="m">0</span>,0,0,0 <span class="w"> </span>latency<span class="w"> </span>:<span class="w"> </span><span class="nv">target</span><span class="o">=</span><span class="m">0</span>,<span class="w"> </span><span class="nv">window</span><span class="o">=</span><span class="m">0</span>,<span class="w"> </span><span class="nv">percentile</span><span class="o">=</span><span class="m">100</span>.00%,<span class="w"> </span><span class="nv">depth</span><span class="o">=</span><span class="m">1</span> Run<span class="w"> </span>status<span class="w"> </span>group<span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="o">(</span>all<span class="w"> </span><span class="nb">jobs</span><span class="o">)</span>: <span class="w"> </span>WRITE:<span class="w"> </span><span class="nv">bw</span><span class="o">=</span>1102MiB/s<span class="w"> </span><span class="o">(</span>1156MB/s<span class="o">)</span>,<span class="w"> </span>1102MiB/s-1102MiB/s<span class="w"> </span><span class="o">(</span>1156MB/s-1156MB/s<span class="o">)</span>,<span class="w"> </span><span class="nv">io</span><span class="o">=</span>1024MiB<span class="w"> </span><span class="o">(</span>1074MB<span class="o">)</span>,<span class="w"> </span><span class="nv">run</span><span class="o">=</span><span class="m">929</span>-929msec </pre></div> <p>1.2GB/s is about in the ballpark of what we got.</p> <p>And with a 1MiB buffer size?</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>fio<span class="w"> </span>--name<span class="o">=</span>fiotest<span class="w"> </span>--rw<span class="o">=</span>write<span class="w"> </span>--size<span class="o">=</span>1G<span class="w"> </span>--bs<span class="o">=</span>1M<span class="w"> </span>--group_reporting<span class="w"> </span>--ioengine<span class="o">=</span>sync fiotest:<span class="w"> </span><span class="o">(</span><span class="nv">g</span><span class="o">=</span><span class="m">0</span><span class="o">)</span>:<span class="w"> </span><span class="nv">rw</span><span class="o">=</span>write,<span class="w"> </span><span class="nv">bs</span><span class="o">=(</span>R<span class="o">)</span><span class="w"> </span>1024KiB-1024KiB,<span class="w"> </span><span class="o">(</span>W<span class="o">)</span><span class="w"> </span>1024KiB-1024KiB,<span class="w"> </span><span class="o">(</span>T<span class="o">)</span><span class="w"> </span>1024KiB-1024KiB,<span class="w"> </span><span class="nv">ioengine</span><span class="o">=</span>sync,<span class="w"> </span><span class="nv">iodepth</span><span class="o">=</span><span class="m">1</span> fio-3.33 Starting<span class="w"> </span><span class="m">1</span><span class="w"> </span>process fiotest:<span class="w"> </span>Laying<span class="w"> </span>out<span class="w"> </span>IO<span class="w"> </span>file<span class="w"> </span><span class="o">(</span><span class="m">1</span><span class="w"> </span>file<span class="w"> </span>/<span class="w"> </span>1024MiB<span class="o">)</span> fiotest:<span class="w"> </span><span class="o">(</span><span class="nv">groupid</span><span class="o">=</span><span class="m">0</span>,<span class="w"> </span><span class="nv">jobs</span><span class="o">=</span><span class="m">1</span><span class="o">)</span>:<span class="w"> </span><span class="nv">err</span><span class="o">=</span><span class="w"> </span><span class="m">0</span>:<span class="w"> </span><span class="nv">pid</span><span class="o">=</span><span class="m">2437239</span>:<span class="w"> </span>Thu<span class="w"> </span>Oct<span class="w"> </span><span class="m">19</span><span class="w"> </span><span class="m">23</span>:32:09<span class="w"> </span><span class="m">2023</span> <span class="w"> </span>write:<span class="w"> </span><span class="nv">IOPS</span><span class="o">=</span><span class="m">3953</span>,<span class="w"> </span><span class="nv">BW</span><span class="o">=</span>3954MiB/s<span class="w"> </span><span class="o">(</span>4146MB/s<span class="o">)(</span>1024MiB/259msec<span class="o">)</span><span class="p">;</span><span class="w"> </span><span class="m">0</span><span class="w"> </span>zone<span class="w"> </span>resets <span class="w"> </span>clat<span class="w"> </span><span class="o">(</span>usec<span class="o">)</span>:<span class="w"> </span><span class="nv">min</span><span class="o">=</span><span class="m">221</span>,<span class="w"> </span><span class="nv">max</span><span class="o">=</span><span class="m">1205</span>,<span class="w"> </span><span class="nv">avg</span><span class="o">=</span><span class="m">241</span>.83,<span class="w"> </span><span class="nv">stdev</span><span class="o">=</span><span class="m">43</span>.93 <span class="w"> </span>lat<span class="w"> </span><span class="o">(</span>usec<span class="o">)</span>:<span class="w"> </span><span class="nv">min</span><span class="o">=</span><span class="m">228</span>,<span class="w"> </span><span class="nv">max</span><span class="o">=</span><span class="m">1250</span>,<span class="w"> </span><span class="nv">avg</span><span class="o">=</span><span class="m">251</span>.68,<span class="w"> </span><span class="nv">stdev</span><span class="o">=</span><span class="m">45</span>.80 <span class="w"> </span>clat<span class="w"> </span>percentiles<span class="w"> </span><span class="o">(</span>usec<span class="o">)</span>: <span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">1</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">225</span><span class="o">]</span>,<span class="w"> </span><span class="m">5</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">225</span><span class="o">]</span>,<span class="w"> </span><span class="m">10</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">227</span><span class="o">]</span>,<span class="w"> </span><span class="m">20</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">227</span><span class="o">]</span>, <span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">30</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">231</span><span class="o">]</span>,<span class="w"> </span><span class="m">40</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">233</span><span class="o">]</span>,<span class="w"> </span><span class="m">50</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">235</span><span class="o">]</span>,<span class="w"> </span><span class="m">60</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">239</span><span class="o">]</span>, <span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">70</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">243</span><span class="o">]</span>,<span class="w"> </span><span class="m">80</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">249</span><span class="o">]</span>,<span class="w"> </span><span class="m">90</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">262</span><span class="o">]</span>,<span class="w"> </span><span class="m">95</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">269</span><span class="o">]</span>, <span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">99</span>.00th<span class="o">=[</span><span class="w"> </span><span class="m">302</span><span class="o">]</span>,<span class="w"> </span><span class="m">99</span>.50th<span class="o">=[</span><span class="w"> </span><span class="m">318</span><span class="o">]</span>,<span class="w"> </span><span class="m">99</span>.90th<span class="o">=[</span><span class="w"> </span><span class="m">1074</span><span class="o">]</span>,<span class="w"> </span><span class="m">99</span>.95th<span class="o">=[</span><span class="w"> </span><span class="m">1205</span><span class="o">]</span>, <span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">99</span>.99th<span class="o">=[</span><span class="w"> </span><span class="m">1205</span><span class="o">]</span> <span class="w"> </span>lat<span class="w"> </span><span class="o">(</span>usec<span class="o">)</span><span class="w"> </span>:<span class="w"> </span><span class="nv">250</span><span class="o">=</span><span class="m">80</span>.96%,<span class="w"> </span><span class="nv">500</span><span class="o">=</span><span class="m">18</span>.85% <span class="w"> </span>lat<span class="w"> </span><span class="o">(</span>msec<span class="o">)</span><span class="w"> </span>:<span class="w"> </span><span class="nv">2</span><span class="o">=</span><span class="m">0</span>.20% <span class="w"> </span>cpu<span class="w"> </span>:<span class="w"> </span><span class="nv">usr</span><span class="o">=</span><span class="m">4</span>.26%,<span class="w"> </span><span class="nv">sys</span><span class="o">=</span><span class="m">94</span>.96%,<span class="w"> </span><span class="nv">ctx</span><span class="o">=</span><span class="m">3</span>,<span class="w"> </span><span class="nv">majf</span><span class="o">=</span><span class="m">0</span>,<span class="w"> </span><span class="nv">minf</span><span class="o">=</span><span class="m">10</span> <span class="w"> </span>IO<span class="w"> </span>depths<span class="w"> </span>:<span class="w"> </span><span class="nv">1</span><span class="o">=</span><span class="m">100</span>.0%,<span class="w"> </span><span class="nv">2</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">4</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">8</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">16</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">32</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span>&gt;<span class="o">=</span><span class="nv">64</span><span class="o">=</span><span class="m">0</span>.0% <span class="w"> </span>submit<span class="w"> </span>:<span class="w"> </span><span class="nv">0</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">4</span><span class="o">=</span><span class="m">100</span>.0%,<span class="w"> </span><span class="nv">8</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">16</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">32</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">64</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span>&gt;<span class="o">=</span><span class="nv">64</span><span class="o">=</span><span class="m">0</span>.0% <span class="w"> </span><span class="nb">complete</span><span class="w"> </span>:<span class="w"> </span><span class="nv">0</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">4</span><span class="o">=</span><span class="m">100</span>.0%,<span class="w"> </span><span class="nv">8</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">16</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">32</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span><span class="nv">64</span><span class="o">=</span><span class="m">0</span>.0%,<span class="w"> </span>&gt;<span class="o">=</span><span class="nv">64</span><span class="o">=</span><span class="m">0</span>.0% <span class="w"> </span>issued<span class="w"> </span>rwts:<span class="w"> </span><span class="nv">total</span><span class="o">=</span><span class="m">0</span>,1024,0,0<span class="w"> </span><span class="nv">short</span><span class="o">=</span><span class="m">0</span>,0,0,0<span class="w"> </span><span class="nv">dropped</span><span class="o">=</span><span class="m">0</span>,0,0,0 <span class="w"> </span>latency<span class="w"> </span>:<span class="w"> </span><span class="nv">target</span><span class="o">=</span><span class="m">0</span>,<span class="w"> </span><span class="nv">window</span><span class="o">=</span><span class="m">0</span>,<span class="w"> </span><span class="nv">percentile</span><span class="o">=</span><span class="m">100</span>.00%,<span class="w"> </span><span class="nv">depth</span><span class="o">=</span><span class="m">1</span> Run<span class="w"> </span>status<span class="w"> </span>group<span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="o">(</span>all<span class="w"> </span><span class="nb">jobs</span><span class="o">)</span>: <span class="w"> </span>WRITE:<span class="w"> </span><span class="nv">bw</span><span class="o">=</span>3954MiB/s<span class="w"> </span><span class="o">(</span>4146MB/s<span class="o">)</span>,<span class="w"> </span>3954MiB/s-3954MiB/s<span class="w"> </span><span class="o">(</span>4146MB/s-4146MB/s<span class="o">)</span>,<span class="w"> </span><span class="nv">io</span><span class="o">=</span>1024MiB<span class="w"> </span><span class="o">(</span>1074MB<span class="o">)</span>,<span class="w"> </span><span class="nv">run</span><span class="o">=</span><span class="m">259</span>-259msec </pre></div> <p>3.9GB/s is also roughly in the same ballpark we got.</p> <p>Our code seems reasonable!</p> <h3 id="what's-next?">What's next?</h3><p>None of this is original. <code>fio</code> is a similar tool, written in C, with many different IO engines including <code>libaio</code> and <code>writev</code> support. And it has many different IO workloads.</p> <p>But it's been enjoyable to learn more about these APIs. How to program them and how they compare to eachother.</p> <p>So next steps could include adding additional IO engines or IO workloads.</p> <p>Also, either I need to understand Iceber's Go library better or its API needs to be loosened up a little bit so we can get that awesome ring buffer behavior we could use from Zig.</p> <p>Keep an eye out here and on my <a href="https://github.com/eatonphil/io-playground">io-playground repo</a>!</p> <h3 id="selected-responses-after-publication">Selected responses after publication</h3><ul> <li>wizeman on lobsters <a href="https://lobste.rs/s/rimkv3/io_uring_basics_writing_file_disk#c_qvlx5u">suggests</a> measuring at least 30 seconds worth of writing data and <code>fsync()</code>-ing if you want to test the entire IO subsystem and not just hitting the kernel cache.</li> </ul> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Digging into io_uring has been on my list for a long time now! This week I finally made made some progress.<br><br>Let&#39;s go on a little journey through a few increasingly complex (and useful) implementations of writing a file to disk with io_uring.<a href="https://t.co/gR9K2OQs2R">https://t.co/gR9K2OQs2R</a> <a href="https://t.co/TMaC8QYL6k">pic.twitter.com/TMaC8QYL6k</a></p>&mdash; Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1715151609615773965?ref_src=twsrc%5Etfw">October 19, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/2023-10-19-write-file-to-disk-with-io_uring.htmlThu, 19 Oct 2023 00:00:00 +0000Go database driver overhead on insert-heavy workloadshttp://notes.eatonphil.com/2023-10-05-go-database-sql-overhead-on-insert-heavy-workloads.html<p>The most popular SQLite and PostgreSQL database drivers in Go are (roughly) 20-76% slower than alternative Go drivers on insert-heavy benchmarks of mine. So if you are bulk-inserting data with Go (and potentially also bulk-retrieving data with Go), you may want to consider the driver carefully. And you may want to consider avoiding <code>database/sql</code>.</p> <p>Some driver authors have <a href="https://github.com/lib/pq/issues/771">noted</a> and <a href="https://github.com/ClickHouse/clickhouse-go/tree/main#benchmark">benchmarked</a> issues with <a href="https://github.com/jackc/pgx#choosing-between-the-pgx-and-databasesql-interfaces">database/sql</a>.</p> <p>So it may be the case that <code>database/sql</code> is responsible for some of this overhead. And indeed the variations between drivers in this post will be demonstrated by using <code>database/sql</code> and avoiding it. This post won't specifically prove that the variation is due to the <code>database/sql</code> interface. But that doesn't change the premise.</p> <p class="note"> Not covered in this post but something to consider: JetBrains <a href="https://blog.jetbrains.com/go/2023/04/27/comparing-db-packages/">has suggested</a> that other frontends like sqlc, sqlx, and GORM do worse than <code>database/sql</code>. </p><p>This post is built on the workload, environment, libraries, and methodology in my <a href="https://github.com/eatonphil/databases-intuition">databases-intuition repo on GitHub</a>. See the repo for details that will help you reproduce or correct me.</p> <h3 id="insert-workload">INSERT workload</h3><p>In this workload, the data is random and there are no indexes. Neither of these aspects matter for this post though because we're comparing behavior within the same database among different drivers. This was just a workload I already had.</p> <p>Two different data sizes are tested:</p> <ol> <li>10M rows with 16 columns, each column is 32 bytes</li> <li>10M rows with 3 columns, each column is 8 bytes</li> </ol> <p>Each test is run 10 times and we record median, standard deviation, min, max and throughput.</p> <h3 id="sqlite">SQLite</h3><p>Both variations presented here load 10M rows using a single prepared statement called for each row within a single transaction.</p> <p>The most popular driver is <a href="https://github.com/mattn/go-sqlite3">mattn/go-sqlite3</a>.</p> <p>It is roughly 20-40% slower than another driver that avoids <code>database/sql</code>.</p> <p>10M Rows, 16 columns, each column 32 bytes:</p> <div class="highlight"><pre><span></span>Timing:<span class="w"> </span><span class="m">56</span>.53<span class="w"> </span>±<span class="w"> </span><span class="m">1</span>.26s,<span class="w"> </span>Min:<span class="w"> </span><span class="m">55</span>.05s,<span class="w"> </span>Max:<span class="w"> </span><span class="m">59</span>.62s Throughput:<span class="w"> </span><span class="m">176</span>,893.65<span class="w"> </span>±<span class="w"> </span><span class="m">3</span>,853.90<span class="w"> </span>rows/s,<span class="w"> </span>Min:<span class="w"> </span><span class="m">167</span>,719.97<span class="w"> </span>rows/s,<span class="w"> </span>Max:<span class="w"> </span><span class="m">181</span>,646.02<span class="w"> </span>rows/s </pre></div> <p>10M Rows, 3 columns, each column 8 bytes:</p> <div class="highlight"><pre><span></span>Timing:<span class="w"> </span><span class="m">15</span>.92<span class="w"> </span>±<span class="w"> </span><span class="m">0</span>.25s,<span class="w"> </span>Min:<span class="w"> </span><span class="m">15</span>.69s,<span class="w"> </span>Max:<span class="w"> </span><span class="m">16</span>.67s Throughput:<span class="w"> </span><span class="m">628</span>,044.37<span class="w"> </span>±<span class="w"> </span><span class="m">9</span>,703.92<span class="w"> </span>rows/s,<span class="w"> </span>Min:<span class="w"> </span><span class="m">599</span>,852.91<span class="w"> </span>rows/s,<span class="w"> </span>Max:<span class="w"> </span><span class="m">637</span>,435.60<span class="w"> </span>rows/s </pre></div> <p>The other driver I tested is my own fork of <a href="https://github.com/bvinc/go-sqlite-lite">bvinc/go-sqlite-lite</a> called <a href="https://github.com/eatonphil/gosqlite">eatonphil/gosqlite</a>. I forked it because it is unmaintained and I wanted to bring it up-to-date for tests like this.</p> <p>10M Rows, 16 columns, each column 32 bytes:</p> <div class="highlight"><pre><span></span>Timing:<span class="w"> </span><span class="m">45</span>.51<span class="w"> </span>±<span class="w"> </span><span class="m">0</span>.70s,<span class="w"> </span>Min:<span class="w"> </span><span class="m">43</span>.72s,<span class="w"> </span>Max:<span class="w"> </span><span class="m">45</span>.93s Throughput:<span class="w"> </span><span class="m">219</span>,729.65<span class="w"> </span>±<span class="w"> </span><span class="m">3</span>,447.56<span class="w"> </span>rows/s,<span class="w"> </span>Min:<span class="w"> </span><span class="m">217</span>,742.98<span class="w"> </span>rows/s,<span class="w"> </span>Max:<span class="w"> </span><span class="m">228</span>,711.51<span class="w"> </span>rows/s </pre></div> <p>10M Rows, 3 columns, each column 8 bytes:</p> <div class="highlight"><pre><span></span>Timing:<span class="w"> </span><span class="m">10</span>.44<span class="w"> </span>±<span class="w"> </span><span class="m">0</span>.20s,<span class="w"> </span>Min:<span class="w"> </span><span class="m">10</span>.02s,<span class="w"> </span>Max:<span class="w"> </span><span class="m">10</span>.68s Throughput:<span class="w"> </span><span class="m">957</span>,939.60<span class="w"> </span>±<span class="w"> </span><span class="m">18</span>,879.43<span class="w"> </span>rows/s,<span class="w"> </span>Min:<span class="w"> </span><span class="m">936</span>,114.60<span class="w"> </span>rows/s,<span class="w"> </span>Max:<span class="w"> </span><span class="m">998</span>,426.62<span class="w"> </span>rows/s </pre></div> <h3 id="postgresql">PostgreSQL</h3><p>Both variations presented use PostgreSQL's <a href="https://www.postgresql.org/docs/current/sql-copy.html"><code>COPY FROM</code></a> support. This is significantly faster for PostgreSQL than doing the prepared statement we do in SQLite. (<a href="https://github.com/eatonphil/databases-intuition#postgresql-prepared-insert">Here</a> are my results for doing prepared statement INSERTs in PostgreSQL if you are curious.)</p> <p>The most popular PostgreSQL driver is <a href="https://github.com/lib/pq">lib/pq</a>. The <a href="https://github.com/lib/pq/issues/771">performance issues</a> with lib/pq are <a href="https://github.com/jackc/pgx#choosing-between-the-pgx-and-databasesql-interfaces">well-known</a>, and the <a href="https://github.com/lib/pq#status">repo itself</a> is marked as no longer developed.</p> <p>It is roughly 44-76% slower than an alternative driver that avoids <code>database/sql</code>.</p> <p>10M Rows, 16 columns, each column 32 bytes:</p> <div class="highlight"><pre><span></span>Timing:<span class="w"> </span><span class="m">104</span>.53<span class="w"> </span>±<span class="w"> </span><span class="m">2</span>.40s,<span class="w"> </span>Min:<span class="w"> </span><span class="m">102</span>.57s,<span class="w"> </span>Max:<span class="w"> </span><span class="m">110</span>.08s Throughput:<span class="w"> </span><span class="m">95</span>,665.37<span class="w"> </span>±<span class="w"> </span><span class="m">2</span>,129.25<span class="w"> </span>rows/s,<span class="w"> </span>Min:<span class="w"> </span><span class="m">90</span>,847.08<span class="w"> </span>rows/s,<span class="w"> </span>Max:<span class="w"> </span><span class="m">97</span>,490.96<span class="w"> </span>rows/s </pre></div> <p>10M Rows, 3 columns, each column 8 bytes:</p> <div class="highlight"><pre><span></span>Timing:<span class="w"> </span><span class="m">8</span>.16<span class="w"> </span>±<span class="w"> </span><span class="m">0</span>.43s,<span class="w"> </span>Min:<span class="w"> </span><span class="m">7</span>.44s,<span class="w"> </span>Max:<span class="w"> </span><span class="m">8</span>.80s Throughput:<span class="w"> </span><span class="m">1</span>,225,986.47<span class="w"> </span>±<span class="w"> </span><span class="m">66</span>,631.53<span class="w"> </span>rows/s,<span class="w"> </span>Min:<span class="w"> </span><span class="m">1</span>,136,581.82<span class="w"> </span>rows/s,<span class="w"> </span>Max:<span class="w"> </span><span class="m">1</span>,343,441.37<span class="w"> </span>rows </pre></div> <p>The other driver I tested is <a href="https://github.com/jackc/pgx">jackc/pgx</a>, without <code>database/sql</code>.</p> <p>10M Rows, 16 columns, each column 32 bytes:</p> <div class="highlight"><pre><span></span>Timing:<span class="w"> </span><span class="m">46</span>.54<span class="w"> </span>±<span class="w"> </span><span class="m">1</span>.60s,<span class="w"> </span>Min:<span class="w"> </span><span class="m">44</span>.09s,<span class="w"> </span>Max:<span class="w"> </span><span class="m">49</span>.51s Throughput:<span class="w"> </span><span class="m">214</span>,869.42<span class="w"> </span>±<span class="w"> </span><span class="m">7</span>,265.10<span class="w"> </span>rows/s,<span class="w"> </span>Min:<span class="w"> </span><span class="m">201</span>,991.37<span class="w"> </span>rows/s,<span class="w"> </span>Max:<span class="w"> </span><span class="m">226</span>,801.07<span class="w"> </span>rows/s </pre></div> <p>10M Rows, 3 columns, each column 8 bytes:</p> <div class="highlight"><pre><span></span>Timing:<span class="w"> </span><span class="m">5</span>.20<span class="w"> </span>±<span class="w"> </span><span class="m">0</span>.44s,<span class="w"> </span>Min:<span class="w"> </span><span class="m">4</span>.71s,<span class="w"> </span>Max:<span class="w"> </span><span class="m">5</span>.96s Throughput:<span class="w"> </span><span class="m">1</span>,923,722.79<span class="w"> </span>±<span class="w"> </span><span class="m">156</span>,820.46<span class="w"> </span>rows/s,<span class="w"> </span>Min:<span class="w"> </span><span class="m">1</span>,676,894.32<span class="w"> </span>rows/s,<span class="w"> </span>Max:<span class="w"> </span><span class="m">2</span>,124,966.60<span class="w"> </span>rows/ </pre></div> <p>The discrepancies here are even greater than with the different SQLite drivers.</p> <h3 id="workloads-with-small-resultset">Workloads with small resultset</h3><p>I won't go into as much detail but if you're doing queries that don't return many rows, the difference between drivers is negligible.</p> <p>See <a href="https://github.com/eatonphil/databases-intuition#selects">here</a> for details.</p> <h3 id="conclusion">Conclusion</h3><p>If you are doing INSERT-heavy workloads, or you are processing large number of rows returned from your SQL database, you might want to try benchmarking the same workload with different drivers.</p> <p>And specifically, there is likely no good reason to use <code>lib/pq</code> anymore for accessing PostgreSQL from Go. Just use jackc/pgx.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">For INSERT-heavy workloads in Go, you may want to switch database drivers. For PostgreSQL and SQLite, the popular drivers are 20-76% slower for this workload in my tests.<br><br>Some driver developers have reported issues with database/sql as an interface.<a href="https://t.co/NLVp0P2uiV">https://t.co/NLVp0P2uiV</a> <a href="https://t.co/RxTbgMZ1MG">pic.twitter.com/RxTbgMZ1MG</a></p>&mdash; Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1710249941904351718?ref_src=twsrc%5Etfw">October 6, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/2023-10-05-go-database-sql-overhead-on-insert-heavy-workloads.htmlThu, 05 Oct 2023 00:00:00 +0000Intercepting and modifying Linux system calls with ptracehttp://notes.eatonphil.com/2023-10-01-intercepting-and-modifying-linux-system-calls-with-ptrace.html<p>How software fails is interesting. But real-world errors can be infrequent to manifest. <a href="https://course.ece.cmu.edu/~ece749/docs/faultInjectionSurvey.pdf">Fault injection</a> is a formal-sounding term that just means: trying to explicitly trigger errors in the hopes of discovering bad logic, typically during automated tests.</p> <p><a href="https://github.com/jepsen-io/jepsen">Jepsen</a> and <a href="https://github.com/Netflix/chaosmonkey">ChaosMonkey</a> are two famous examples that help to trigger process and network failure. But what about disk and filesystem errors?</p> <p>A few avenues seem worth investigating:</p> <ul> <li>A custom FUSE filesystem</li> <li>An LD_PRELOAD interception layer</li> <li>A ptrace system call interception layer</li> <li>A <code>SECCOMP_RET_TRAP</code> interception layer</li> <li>Or, symbolic analysis a la <a href="https://research.cs.wisc.edu/adsl/Publications/alice-osdi14.html">Alice from University of Wisconsin-Madison</a></li> </ul> <p>I would like to try out FUSE sometime. But LD_PRELOAD layer only works if IO goes through libc, which won't be the case for all programs. ptrace is something I've wanted to dig into for years since learning about <a href="https://www.usenix.org/system/files/hotcloud19-paper-young.pdf">gvisor</a>.</p> <p><code>SECCOMP_RET_TRAP</code> doesn't have the same high-level guides that ptrace does so maybe I'll dig into it later. And symbolic analysis might be able to detect bad workloads but it also isn't fault injection. Maybe it's the better idea but fault injection just sounds more fun.</p> <p>So this particular post will cover intercepting system calls (syscalls) using ptrace with code written in Zig. Not because readers will likely write their own code in Zig but because hopefully the Zig code will be easier for you to read and adapt to your language compared to if we had to deal with the verbosity and inconvenience of C.</p> <p>In the end, we'll be able to intercept and force short (incomplete) writes in a Go, Python, and C program. Emulating a disk that is having an issue completing the write. This is a case that isn't common, but should probably be handled with retries in production code.</p> <p>This post corresponds roughly to <a href="https://github.com/eatonphil/badio/tree/720c3ee0482e6dcb1dd49d1789bccf86747b7776">this commit</a> on GitHub.</p> <h3 id="a-bad-program">A bad program</h3><p>First off, let's write some code for a program that would exhibit a short write. Basically, we write to a file and don't check how many bytes we wrote. This is extremely common code; or at least I've written it often.</p> <div class="highlight"><pre><span></span><span class="err">$</span><span class="w"> </span><span class="nx">cat</span><span class="w"> </span><span class="nx">test</span><span class="o">/</span><span class="nx">main</span><span class="p">.</span><span class="k">go</span> <span class="kn">package</span><span class="w"> </span><span class="nx">main</span> <span class="kn">import</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="s">&quot;os&quot;</span> <span class="p">)</span> <span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">f</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">OpenFile</span><span class="p">(</span><span class="s">&quot;test.txt&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">O_RDWR</span><span class="o">|</span><span class="nx">os</span><span class="p">.</span><span class="nx">O_CREATE</span><span class="o">|</span><span class="nx">os</span><span class="p">.</span><span class="nx">O_TRUNC</span><span class="p">,</span><span class="w"> </span><span class="mo">0755</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">text</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="s">&quot;some great stuff&quot;</span> <span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">f</span><span class="p">.</span><span class="nx">Write</span><span class="p">([]</span><span class="nb">byte</span><span class="p">(</span><span class="nx">text</span><span class="p">))</span> <span class="w"> </span><span class="nx">_</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">f</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span> <span class="p">}</span> </pre></div> <p>With this code, if the <code>Write()</code> call doesn't actually succeed in writing everything, we won't know that. And the file will contain less than all of <code>some great stuff</code>.</p> <p>This logical mistake will happen rarely, if ever, on a normal disk. But it is possible.</p> <p>Now that we've got an example program in mind, let's see if we can trigger the logic error.</p> <h3 id="ptrace">ptrace</h3><p>ptrace is a somewhat cross-platform layer that allows you to intercept syscalls in a process. You can read and modify memory and registers in the process, when the syscalls starts and before it finishes.</p> <p>gdb and strace both use ptrace for their magic.</p> <p>Google's gvisor that <a href="https://cloud.google.com/run/docs/container-contract">powers various serverless runtimes in Google Cloud</a> was also historically based on ptrace (<code>PTRACE_SYSEMU</code> specifically, which we won't explore much in this post).</p> <p class="note"> Interestingly though, gvisor <a href="https://gvisor.dev/blog/2023/04/28/systrap-release/">switched only this year </a> (2023) to a different default backend for trapping system calls. Based on <a href="https://www.kernel.org/doc/Documentation/prctl/seccomp_filter.txt"><code>SECCOMP_RET_TRAP</code></a>. <br /> <br /> You can get similar vibes from <a href="https://www.brendangregg.com/blog/2014-05-11/strace-wow-much-syscall.html">this Brendan Gregg post</a> on the dangers of using strace (that is based on ptrace) in production. </p><p>Although ptrace is cross-platform, actually writing cross-platform-aware code with ptrace can be complex. So this post assumes amd64/linux.</p> <h3 id="protocol">Protocol</h3><p>The ptrace protocol is described in the <a href="https://man7.org/linux/man-pages/man2/ptrace.2.html">ptrace manpage</a>, but <a href="https://nullprogram.com/blog/2018/06/23/">Chris Wellons</a> and <a href="https://webdocs.cs.ualberta.ca/~paullu/C498/meng.ptrace.slides.pdf">a University of Alberta group</a> also wrote nice introductions. I referenced these three pages heavily.</p> <p>Here's what the UAlberta page has to say:</p> <p><img src="/assets/ptraceprotocol.webp" alt="ptrace&#39;s syscall tracing protocol"></p> <p>We fork and have the child call <code>PTRACE_TRACEME</code>. Then we handle each syscall entrance by calling <code>PTRACE_SYSCALL</code> and waiting with <code>wait</code> until the child has entered the syscall. It is in this moment we can mess with things.</p> <h3 id="implementation">Implementation</h3><p>Let's turn that graphic into Zig code.</p> <div class="highlight"><pre><span></span><span class="kr">const</span><span class="w"> </span><span class="n">std</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">&quot;std&quot;</span><span class="p">);</span> <span class="kr">const</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@cImport</span><span class="p">({</span> <span class="w"> </span><span class="nb">@cInclude</span><span class="p">(</span><span class="s">&quot;sys/ptrace.h&quot;</span><span class="p">);</span> <span class="w"> </span><span class="nb">@cInclude</span><span class="p">(</span><span class="s">&quot;sys/user.h&quot;</span><span class="p">);</span> <span class="w"> </span><span class="nb">@cInclude</span><span class="p">(</span><span class="s">&quot;sys/wait.h&quot;</span><span class="p">);</span> <span class="w"> </span><span class="nb">@cInclude</span><span class="p">(</span><span class="s">&quot;errno.h&quot;</span><span class="p">);</span> <span class="p">});</span> <span class="kr">const</span><span class="w"> </span><span class="n">cNullPtr</span><span class="o">:</span><span class="w"> </span><span class="o">?*</span><span class="n">anyopaque</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">null</span><span class="p">;</span> <span class="c1">// TODO //</span> <span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">main</span><span class="p">()</span><span class="w"> </span><span class="o">!</span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">arena</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">heap</span><span class="p">.</span><span class="n">ArenaAllocator</span><span class="p">.</span><span class="n">init</span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">heap</span><span class="p">.</span><span class="n">page_allocator</span><span class="p">);</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">arena</span><span class="p">.</span><span class="n">deinit</span><span class="p">();</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">args</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">process</span><span class="p">.</span><span class="n">argsAlloc</span><span class="p">(</span><span class="n">arena</span><span class="p">.</span><span class="n">allocator</span><span class="p">());</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">assert</span><span class="p">(</span><span class="n">args</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="mi">2</span><span class="p">);</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">pid</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">os</span><span class="p">.</span><span class="n">fork</span><span class="p">();</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">pid</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;Fork failed!</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">pid</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Child process</span> <span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">ptrace</span><span class="p">(</span><span class="n">c</span><span class="p">.</span><span class="n">PTRACE_TRACEME</span><span class="p">,</span><span class="w"> </span><span class="n">pid</span><span class="p">,</span><span class="w"> </span><span class="n">cNullPtr</span><span class="p">,</span><span class="w"> </span><span class="n">cNullPtr</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">process</span><span class="p">.</span><span class="n">execv</span><span class="p">(</span> <span class="w"> </span><span class="n">arena</span><span class="p">.</span><span class="n">allocator</span><span class="p">(),</span> <span class="w"> </span><span class="n">args</span><span class="p">[</span><span class="mi">1</span><span class="p">..],</span> <span class="w"> </span><span class="p">);</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Parent process</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">childPid</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pid</span><span class="p">;</span> <span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">waitpid</span><span class="p">(</span><span class="n">childPid</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">cm</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ChildManager</span><span class="p">{</span><span class="w"> </span><span class="p">.</span><span class="n">arena</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">&amp;</span><span class="n">arena</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">childPid</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">childPid</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">cm</span><span class="p">.</span><span class="n">childInterceptSyscalls</span><span class="p">();</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>So like the graphic suggested, we fork and start a child process. That means this Zig program should be called like:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>zig<span class="w"> </span>build-exe<span class="w"> </span>--library<span class="w"> </span>c<span class="w"> </span>main.zig $<span class="w"> </span>./main<span class="w"> </span>/actual/program/to/intercept<span class="w"> </span>--and<span class="w"> </span>--its<span class="w"> </span>args </pre></div> <p>Presumably, as with strace or gdb, we could instead attach to an already running process with <code>PTRACE_ATTACH</code> or <code>PTRACE_SEIZE</code> (based on the <a href="https://man7.org/linux/man-pages/man2/ptrace.2.html">ptrace manpage</a>) rather than going the <code>PTRACE_TRACEME</code> route. But I haven't tried that out yet.</p> <p>With the child ready to be intercepted, we can implement the <code>ChildManager</code> that actually does the interception.</p> <h4 id="childmanager">ChildManager</h4><p>The core of the <code>ChildManager</code> is an infinite loop (at least as long as the child process lives) that waits for the next syscall and then calls a hook for the sytem call if it exists.</p> <div class="highlight"><pre><span></span><span class="kr">const</span><span class="w"> </span><span class="n">ChildManager</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">arena</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">std</span><span class="p">.</span><span class="n">heap</span><span class="p">.</span><span class="n">ArenaAllocator</span><span class="p">,</span> <span class="w"> </span><span class="n">childPid</span><span class="o">:</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">os</span><span class="p">.</span><span class="n">pid_t</span><span class="p">,</span> <span class="w"> </span><span class="c1">// TODO //</span> <span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">childInterceptSyscalls</span><span class="p">(</span> <span class="w"> </span><span class="n">cm</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">ChildManager</span><span class="p">,</span> <span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="kc">true</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Handle syscall entrance</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">status</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cm</span><span class="p">.</span><span class="n">childWaitForSyscall</span><span class="p">();</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">os</span><span class="p">.</span><span class="n">W</span><span class="p">.</span><span class="n">IFEXITED</span><span class="p">(</span><span class="n">status</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">args</span><span class="o">:</span><span class="w"> </span><span class="n">ABIArguments</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cm</span><span class="p">.</span><span class="n">getABIArguments</span><span class="p">();</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">syscall</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">args</span><span class="p">.</span><span class="n">syscall</span><span class="p">();</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">hooks</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">hook</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">syscall</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">hook</span><span class="p">.</span><span class="n">syscall</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">hook</span><span class="p">.</span><span class="n">hook</span><span class="p">(</span><span class="n">cm</span><span class="p">.</span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="n">args</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="p">};</span> </pre></div> <p>Later we'll write a hook for the <code>sys_write</code> syscall that will force an incomplete write.</p> <p>Back to the protocol, <code>childWaitForSyscall</code> will call <code>PTRACE_SYSCALL</code> to allow the child process to start up again and continue until the next syscall. We'll follow that by <code>wait</code>-ing for the child process to be stopped again so we can handle the syscall entrance.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">childWaitForSyscall</span><span class="p">(</span><span class="n">cm</span><span class="o">:</span><span class="w"> </span><span class="n">ChildManager</span><span class="p">)</span><span class="w"> </span><span class="kt">u32</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">status</span><span class="o">:</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">ptrace</span><span class="p">(</span><span class="n">c</span><span class="p">.</span><span class="n">PTRACE_SYSCALL</span><span class="p">,</span><span class="w"> </span><span class="n">cm</span><span class="p">.</span><span class="n">childPid</span><span class="p">,</span><span class="w"> </span><span class="n">cNullPtr</span><span class="p">,</span><span class="w"> </span><span class="n">cNullPtr</span><span class="p">);</span> <span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">waitpid</span><span class="p">(</span><span class="n">cm</span><span class="p">.</span><span class="n">childPid</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="n">status</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">@bitCast</span><span class="p">(</span><span class="n">status</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Now that we've intercepted a syscall (after <code>waitpid</code> finishes blocking), we need to figure out what syscall it was. We do this by calling <code>PTRACE_GETREGS</code> and reading the <code>rax</code> register which according to <a href="https://stackoverflow.com/a/54957101/1507139">amd64/linux calling convention</a> is the syscall called.</p> <h4 id="registers">Registers</h4><p><code>PTRACE_GETREGS</code> fills out the <a href="https://sites.uclouvain.be/SystInfo/usr/include/sys/user.h.html">following struct</a>:</p> <div class="highlight"><pre><span></span><span class="k">struct</span><span class="w"> </span><span class="nc">user_regs_struct</span> <span class="p">{</span> <span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">r15</span><span class="p">;</span> <span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">r14</span><span class="p">;</span> <span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">r13</span><span class="p">;</span> <span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">r12</span><span class="p">;</span> <span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">rbp</span><span class="p">;</span> <span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">rbx</span><span class="p">;</span> <span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">r11</span><span class="p">;</span> <span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">r10</span><span class="p">;</span> <span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">r9</span><span class="p">;</span> <span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">r8</span><span class="p">;</span> <span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">rax</span><span class="p">;</span> <span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">rcx</span><span class="p">;</span> <span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">rdx</span><span class="p">;</span> <span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">rsi</span><span class="p">;</span> <span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">rdi</span><span class="p">;</span> <span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">orig_rax</span><span class="p">;</span> <span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">rip</span><span class="p">;</span> <span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">cs</span><span class="p">;</span> <span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">eflags</span><span class="p">;</span> <span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">rsp</span><span class="p">;</span> <span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">ss</span><span class="p">;</span> <span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">fs_base</span><span class="p">;</span> <span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">gs_base</span><span class="p">;</span> <span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">ds</span><span class="p">;</span> <span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">es</span><span class="p">;</span> <span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">fs</span><span class="p">;</span> <span class="w"> </span><span class="kt">unsigned</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">gs</span><span class="p">;</span> <span class="p">};</span> </pre></div> <p>Let's write a little amd64/linux-specific wrapper for accessing meaningful fields.</p> <div class="highlight"><pre><span></span><span class="kr">const</span><span class="w"> </span><span class="n">ABIArguments</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">regs</span><span class="o">:</span><span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">user_regs_struct</span><span class="p">,</span> <span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">nth</span><span class="p">(</span><span class="n">aa</span><span class="o">:</span><span class="w"> </span><span class="n">ABIArguments</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="o">:</span><span class="w"> </span><span class="kt">u8</span><span class="p">)</span><span class="w"> </span><span class="kt">c_ulong</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">assert</span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="mi">4</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="n">aa</span><span class="p">.</span><span class="n">regs</span><span class="p">.</span><span class="n">rdi</span><span class="p">,</span> <span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="n">aa</span><span class="p">.</span><span class="n">regs</span><span class="p">.</span><span class="n">rsi</span><span class="p">,</span> <span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="n">aa</span><span class="p">.</span><span class="n">regs</span><span class="p">.</span><span class="n">rdx</span><span class="p">,</span> <span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="k">unreachable</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">setNth</span><span class="p">(</span><span class="n">aa</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">ABIArguments</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="o">:</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">value</span><span class="o">:</span><span class="w"> </span><span class="kt">c_ulong</span><span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">assert</span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="mi">4</span><span class="p">);</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">aa</span><span class="p">.</span><span class="n">regs</span><span class="p">.</span><span class="n">rdi</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">value</span><span class="p">;</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">aa</span><span class="p">.</span><span class="n">regs</span><span class="p">.</span><span class="n">rsi</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">value</span><span class="p">;</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">aa</span><span class="p">.</span><span class="n">regs</span><span class="p">.</span><span class="n">rdx</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">value</span><span class="p">;</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="k">unreachable</span><span class="p">,</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">result</span><span class="p">(</span><span class="n">aa</span><span class="o">:</span><span class="w"> </span><span class="n">ABIArguments</span><span class="p">)</span><span class="w"> </span><span class="kt">c_ulong</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">aa</span><span class="p">.</span><span class="n">regs</span><span class="p">.</span><span class="n">rax</span><span class="p">;</span><span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">setResult</span><span class="p">(</span><span class="n">aa</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">ABIArguments</span><span class="p">,</span><span class="w"> </span><span class="n">value</span><span class="o">:</span><span class="w"> </span><span class="kt">c_ulong</span><span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">aa</span><span class="p">.</span><span class="n">regs</span><span class="p">.</span><span class="n">rax</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">value</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">syscall</span><span class="p">(</span><span class="n">aa</span><span class="o">:</span><span class="w"> </span><span class="n">ABIArguments</span><span class="p">)</span><span class="w"> </span><span class="kt">c_ulong</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">aa</span><span class="p">.</span><span class="n">regs</span><span class="p">.</span><span class="n">orig_rax</span><span class="p">;</span><span class="w"> </span><span class="p">}</span> <span class="p">};</span> </pre></div> <p>One thing to note is that the field we read to get <code>rax</code> is not <code>aa.regs.rax</code> but <code>aa.regs.orig_rax</code>. This is because <code>rax</code> is also the return value and <code>PTRACE_SYSCALL</code> gets called twice for some syscalls on entrance and exit. The <code>orig_rax</code> field preserves the original <code>rax</code> value on syscall entrance. You can read more about this <a href="https://stackoverflow.com/questions/6468896/why-is-orig-eax-provided-in-addition-to-eax/6469069#6469069">here</a>.</p> <h4 id="getting-and-setting-registers">Getting and setting registers</h4><p>Now let's write the <code>ChildManager</code> code that actually calls <code>PTRACE_GETREGS</code> to fill out one of these structs.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">getABIArguments</span><span class="p">(</span><span class="n">cm</span><span class="o">:</span><span class="w"> </span><span class="n">ChildManager</span><span class="p">)</span><span class="w"> </span><span class="n">ABIArguments</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">args</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ABIArguments</span><span class="p">{</span><span class="w"> </span><span class="p">.</span><span class="n">regs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">undefined</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">ptrace</span><span class="p">(</span><span class="n">c</span><span class="p">.</span><span class="n">PTRACE_GETREGS</span><span class="p">,</span><span class="w"> </span><span class="n">cm</span><span class="p">.</span><span class="n">childPid</span><span class="p">,</span><span class="w"> </span><span class="n">cNullPtr</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="n">args</span><span class="p">.</span><span class="n">regs</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">args</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Setting registers is similar, we just pass the struct back and call <code>PTRACE_SETREGS</code> instead:</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">setABIArguments</span><span class="p">(</span><span class="n">cm</span><span class="o">:</span><span class="w"> </span><span class="n">ChildManager</span><span class="p">,</span><span class="w"> </span><span class="n">args</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">ABIArguments</span><span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">ptrace</span><span class="p">(</span><span class="n">c</span><span class="p">.</span><span class="n">PTRACE_SETREGS</span><span class="p">,</span><span class="w"> </span><span class="n">cm</span><span class="p">.</span><span class="n">childPid</span><span class="p">,</span><span class="w"> </span><span class="n">cNullPtr</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="n">args</span><span class="p">.</span><span class="n">regs</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> </pre></div> <h4 id="a-hook">A hook</h4><p>Now we finally have enough code to write a hook that can get and set registers; i.e. manipulate a system call!</p> <p>We'll start by registering a <code>sys_write</code> hook in the <code>hooks</code> field we check in <code>childInterceptSyscalls</code> above.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">hooks</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">&amp;</span><span class="p">[</span><span class="n">_</span><span class="p">]</span><span class="k">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">syscall</span><span class="o">:</span><span class="w"> </span><span class="kt">c_ulong</span><span class="p">,</span> <span class="w"> </span><span class="n">hook</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="kr">const</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="p">(</span><span class="n">ChildManager</span><span class="p">,</span><span class="w"> </span><span class="o">*</span><span class="n">ABIArguments</span><span class="p">)</span><span class="w"> </span><span class="kt">anyerror</span><span class="o">!</span><span class="kt">void</span><span class="p">,</span> <span class="w"> </span><span class="p">}{.{</span> <span class="w"> </span><span class="p">.</span><span class="n">syscall</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@intFromEnum</span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">os</span><span class="p">.</span><span class="n">linux</span><span class="p">.</span><span class="n">syscalls</span><span class="p">.</span><span class="n">X64</span><span class="p">.</span><span class="n">write</span><span class="p">),</span> <span class="w"> </span><span class="p">.</span><span class="n">hook</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">writeHandler</span><span class="p">,</span> <span class="w"> </span><span class="p">}};</span> </pre></div> <p>If we look at the <a href="https://man7.org/linux/man-pages/man2/write.2.html">manpage for <code>write</code></a> we see it takes three arguments</p> <ol> <li>The file descriptor (fd) to write to</li> <li>The address to start writing data from</li> <li>And the number of bytes to write</li> </ol> <p>Going back to the <a href="https://stackoverflow.com/questions/2535989/what-are-the-calling-conventions-for-unix-linux-system-calls-and-user-space-f">calling convention</a> that means the fd will be in <code>rdi</code>, the data address in <code>rsi</code>, and the data length in <code>rdx</code>.</p> <p>So if we shorten the data length, we should be creating a short (incomplete) write.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">writeHandler</span><span class="p">(</span><span class="n">cm</span><span class="o">:</span><span class="w"> </span><span class="n">ChildManager</span><span class="p">,</span><span class="w"> </span><span class="n">entryArgs</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">ABIArguments</span><span class="p">)</span><span class="w"> </span><span class="kt">anyerror</span><span class="o">!</span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">fd</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">entryArgs</span><span class="p">.</span><span class="n">nth</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">dataAddress</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">entryArgs</span><span class="p">.</span><span class="n">nth</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">dataLength</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">entryArgs</span><span class="p">.</span><span class="n">nth</span><span class="p">(</span><span class="mi">2</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Truncate some bytes</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">dataLength</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="mi">2</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">dataLength</span><span class="w"> </span><span class="o">-=</span><span class="w"> </span><span class="mi">2</span><span class="p">;</span> <span class="w"> </span><span class="n">entryArgs</span><span class="p">.</span><span class="n">setNth</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="n">dataLength</span><span class="p">);</span> <span class="w"> </span><span class="n">cm</span><span class="p">.</span><span class="n">setABIArguments</span><span class="p">(</span><span class="n">entryArgs</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>In a more sophisticated version of this program, we could randomly decide when to truncate data and randomly decide how much data to truncate. However, for our purposes this is sufficient.</p> <p>But there are some real problems with this code. When I ran this program against a basic Go program, I saw duplicate requests.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Ah ok, PTRACE_SYSCALL gets hit when you both enter and exit a syscall.<br><br>So each time you call PTRACE_SYSCALL and you do stuff, you just call it again afterwards to handle/wait for the exit. <a href="https://t.co/PjmNwcMepG">pic.twitter.com/PjmNwcMepG</a></p>&mdash; Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1707846783035183267?ref_src=twsrc%5Etfw">September 29, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> <p>So the deal with <code>PTRACE_SYSCALL</code> is that for (most?) syscalls, you get to modify data before the data actually is handled by the kernel. And you get to modify data after the kernel has finished the syscall too.</p> <p>This makes sense because <code>PTRACE_SYSCALL</code> (unlike <code>PTRACE_SYSEMU</code>) allows the syscall to actually happen. And if we wanted to, for example, modify the syscall exit code, we'd have to do that after the syscall was done not before it started. We are modifying registers directly after all.</p> <p>All this means for our Zig code is that when we handle <code>sys_write</code>, we need to call <code>PTRACE_SYSCALL</code> again to process the syscall exit. Otherwise we'd reach this <code>writeHandler</code> for both entries and exits, which would require some additional way of disambiguating entrances from exits.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">writeHandler</span><span class="p">(</span><span class="n">cm</span><span class="o">:</span><span class="w"> </span><span class="n">ChildManager</span><span class="p">,</span><span class="w"> </span><span class="n">entryArgs</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">ABIArguments</span><span class="p">)</span><span class="w"> </span><span class="kt">anyerror</span><span class="o">!</span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">fd</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">entryArgs</span><span class="p">.</span><span class="n">nth</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">dataAddress</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">entryArgs</span><span class="p">.</span><span class="n">nth</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">dataLength</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">entryArgs</span><span class="p">.</span><span class="n">nth</span><span class="p">(</span><span class="mi">2</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Truncate some bytes</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">dataLength</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="mi">2</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">dataLength</span><span class="w"> </span><span class="o">-=</span><span class="w"> </span><span class="mi">2</span><span class="p">;</span> <span class="w"> </span><span class="n">entryArgs</span><span class="p">.</span><span class="n">setNth</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="n">dataLength</span><span class="p">);</span> <span class="w"> </span><span class="n">cm</span><span class="p">.</span><span class="n">setABIArguments</span><span class="p">(</span><span class="n">entryArgs</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">cm</span><span class="p">.</span><span class="n">childReadData</span><span class="p">(</span><span class="n">dataAddress</span><span class="p">,</span><span class="w"> </span><span class="n">dataLength</span><span class="p">);</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">deinit</span><span class="p">();</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;Got a write on {}: {s}</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="n">fd</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">items</span><span class="w"> </span><span class="p">});</span> <span class="w"> </span><span class="c1">// Handle syscall exit</span> <span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cm</span><span class="p">.</span><span class="n">childWaitForSyscall</span><span class="p">();</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p class="note"> We could put the <code>cm.childWaitForSyscall()</code> waiting for the syscall exit in the main loop and I did try that at first. However, not all syscalls seemed to have the same entry and exit hook and this resulted in the hooks sometimes starting with a syscall exit rather than a syscall entry. So rather than making the code more complicated, I decided to only wait for the exit on syscalls I knew had an exit (by observation at least), like <code>sys_write</code>. </p><h3 id="multiple-writes?-no-bad-logic?">Multiple writes? No bad logic?</h3><p>So I had this code as is, correctly handling syscall entrances and exits, but I was seeing multiple write calls. And the text file I was writing to had the complete text I wanted to write. There was no short write even though I truncated the data length.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Ok so what happens in this Go program if I truncate the amount of data?<br><br>I assumed Go would do nothing since all I did was call `f.Write()` once and `f.Write()` returns a number of bytes written.<br><br>But actually, it still writes everything! <a href="https://t.co/OSalKEbERM">pic.twitter.com/OSalKEbERM</a></p>&mdash; Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1707854642250408119?ref_src=twsrc%5Etfw">September 29, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> <p>This took some digging into Go source code to understand. If you trace what <code>os.File.Write()</code> does on Linux you eventually get to <a href="https://cs.opensource.google/go/go/+/refs/tags/go1.21.1:src/internal/poll/fd_unix.go">src/internal/poll/fd_unix.go</a>:</p> <div class="highlight"><pre><span></span><span class="c1">// Write implements io.Writer.</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">fd</span><span class="w"> </span><span class="o">*</span><span class="nx">FD</span><span class="p">)</span><span class="w"> </span><span class="nx">Write</span><span class="p">(</span><span class="nx">p</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">fd</span><span class="p">.</span><span class="nx">writeLock</span><span class="p">();</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">fd</span><span class="p">.</span><span class="nx">writeUnlock</span><span class="p">()</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">fd</span><span class="p">.</span><span class="nx">pd</span><span class="p">.</span><span class="nx">prepareWrite</span><span class="p">(</span><span class="nx">fd</span><span class="p">.</span><span class="nx">isFile</span><span class="p">);</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">nn</span><span class="w"> </span><span class="kt">int</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">max</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">p</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">fd</span><span class="p">.</span><span class="nx">IsStream</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">max</span><span class="o">-</span><span class="nx">nn</span><span class="w"> </span><span class="p">&gt;</span><span class="w"> </span><span class="nx">maxRW</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">max</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">nn</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">maxRW</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">n</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">ignoringEINTRIO</span><span class="p">(</span><span class="nx">syscall</span><span class="p">.</span><span class="nx">Write</span><span class="p">,</span><span class="w"> </span><span class="nx">fd</span><span class="p">.</span><span class="nx">Sysfd</span><span class="p">,</span><span class="w"> </span><span class="nx">p</span><span class="p">[</span><span class="nx">nn</span><span class="p">:</span><span class="nx">max</span><span class="p">])</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">n</span><span class="w"> </span><span class="p">&gt;</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">nn</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="nx">n</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">nn</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">p</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">nn</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">syscall</span><span class="p">.</span><span class="nx">EAGAIN</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">fd</span><span class="p">.</span><span class="nx">pd</span><span class="p">.</span><span class="nx">pollable</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">fd</span><span class="p">.</span><span class="nx">pd</span><span class="p">.</span><span class="nx">waitWrite</span><span class="p">(</span><span class="nx">fd</span><span class="p">.</span><span class="nx">isFile</span><span class="p">);</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">continue</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">nn</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">n</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">nn</span><span class="p">,</span><span class="w"> </span><span class="nx">io</span><span class="p">.</span><span class="nx">ErrUnexpectedEOF</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>This might be common knowledge but I didn't realize Go did this. And when I tried out the same basic program in Python and even C, the behavior was the same. The builtin <code>write()</code> behavior on a file (in many languages apparantly) is to retry until all data is written, with some exceptions.</p> <p>This makes sense since files on disk, unlike file descriptors backed by network sockets, are generally always available. Compared to a network connection, disks are physically close and almost always stay connected. (With some obvious exceptions like network-attached storage and thumb drives.)</p> <p>So to trigger the short write, the easiest way seems to have the <code>sys_write</code> call return an error that is NOT <code>EAGAIN</code> since the code will retry if that is the error.</p> <p>After looking through the <a href="https://man7.org/linux/man-pages/man2/write.2.html#ERRORS">list of errors that sys_write can return</a>, <code>EIO</code> seems like a nice one.</p> <p>So let's do our final version of <code>writeHandler</code> and on the syscall exit, we'll modify the return value (<code>rax</code> in amd64/linux) to be <code>EIO</code>.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">writeHandler</span><span class="p">(</span><span class="n">cm</span><span class="o">:</span><span class="w"> </span><span class="n">ChildManager</span><span class="p">,</span><span class="w"> </span><span class="n">entryArgs</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">ABIArguments</span><span class="p">)</span><span class="w"> </span><span class="kt">anyerror</span><span class="o">!</span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">fd</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">entryArgs</span><span class="p">.</span><span class="n">nth</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">dataAddress</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">entryArgs</span><span class="p">.</span><span class="n">nth</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">dataLength</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">entryArgs</span><span class="p">.</span><span class="n">nth</span><span class="p">(</span><span class="mi">2</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Truncate some bytes</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">dataLength</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="mi">2</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">dataLength</span><span class="w"> </span><span class="o">-=</span><span class="w"> </span><span class="mi">2</span><span class="p">;</span> <span class="w"> </span><span class="n">entryArgs</span><span class="p">.</span><span class="n">setNth</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="n">dataLength</span><span class="p">);</span> <span class="w"> </span><span class="n">cm</span><span class="p">.</span><span class="n">setABIArguments</span><span class="p">(</span><span class="n">entryArgs</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Handle syscall exit</span> <span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cm</span><span class="p">.</span><span class="n">childWaitForSyscall</span><span class="p">();</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">exitArgs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cm</span><span class="p">.</span><span class="n">getABIArguments</span><span class="p">();</span> <span class="w"> </span><span class="n">dataLength</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">exitArgs</span><span class="p">.</span><span class="n">nth</span><span class="p">(</span><span class="mi">2</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">dataLength</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="mi">2</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Force the writes to stop after the first one by returning EIO.</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">result</span><span class="o">:</span><span class="w"> </span><span class="kt">c_ulong</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="n">result</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">result</span><span class="w"> </span><span class="o">-%</span><span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">EIO</span><span class="p">;</span> <span class="w"> </span><span class="n">exitArgs</span><span class="p">.</span><span class="n">setResult</span><span class="p">(</span><span class="n">result</span><span class="p">);</span> <span class="w"> </span><span class="n">cm</span><span class="p">.</span><span class="n">setABIArguments</span><span class="p">(</span><span class="o">&amp;</span><span class="n">exitArgs</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Let's give it a whirl!</p> <h3 id="all-together">All together</h3><p>Build the Zig fault injector and the Go test code:</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>zig<span class="w"> </span>build-exe<span class="w"> </span>--library<span class="w"> </span>c<span class="w"> </span>main.zig <span class="gp">$ </span><span class="o">(</span><span class="w"> </span><span class="nb">cd</span><span class="w"> </span><span class="nb">test</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span>go<span class="w"> </span>build<span class="w"> </span>main.go<span class="w"> </span><span class="o">)</span> </pre></div> <p>And run:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>./main<span class="w"> </span>test/main </pre></div> <p>And check <code>test.txt</code>:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>test.txt some<span class="w"> </span>great<span class="w"> </span>stu </pre></div> <p>Hey, that's a short write! :)</p> <h3 id="sidenote:-reading-data-from-the-child">Sidenote: Reading data from the child</h3><p>We accomplished everything we set out to, but there's one other useful thing we can do: reading the actual data passed to the write syscall.</p> <p>Just like how we can get the child process registers with <code>PTRACE_GETREGS</code>, we can read child memory with <code>PTRACE_PEEKDATA</code>. <code>PTRACE_PEEKDATA</code> takes the child process id and the memory address in the child to read from. It returns a word of data (which on amd64/linux is 8 bytes).</p> <p>We can use the syscall arguments (data address and length) to keep calling <code>PTRACE_PEEKDATA</code> on the child until we've read all bytes of the data the child process wanted to write:</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">childReadData</span><span class="p">(</span> <span class="w"> </span><span class="n">cm</span><span class="o">:</span><span class="w"> </span><span class="n">ChildManager</span><span class="p">,</span> <span class="w"> </span><span class="n">address</span><span class="o">:</span><span class="w"> </span><span class="kt">c_ulong</span><span class="p">,</span> <span class="w"> </span><span class="n">length</span><span class="o">:</span><span class="w"> </span><span class="kt">c_ulong</span><span class="p">,</span> <span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="kt">u8</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="kt">u8</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">cm</span><span class="p">.</span><span class="n">arena</span><span class="p">.</span><span class="n">allocator</span><span class="p">());</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">data</span><span class="p">.</span><span class="n">items</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">length</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">word</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">ptrace</span><span class="p">(</span> <span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">PTRACE_PEEKDATA</span><span class="p">,</span> <span class="w"> </span><span class="n">cm</span><span class="p">.</span><span class="n">childPid</span><span class="p">,</span> <span class="w"> </span><span class="n">address</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">items</span><span class="p">.</span><span class="n">len</span><span class="p">,</span> <span class="w"> </span><span class="n">cNullPtr</span><span class="p">,</span> <span class="w"> </span><span class="p">);</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">asBytes</span><span class="p">(</span><span class="o">&amp;</span><span class="n">word</span><span class="p">))</span><span class="w"> </span><span class="o">|</span><span class="n">byte</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">data</span><span class="p">.</span><span class="n">items</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">length</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">byte</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">data</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>And we could modify <code>writeHandler</code> to print out the entirety of the write message each time (for debugging):</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">writeHandler</span><span class="p">(</span><span class="n">cm</span><span class="o">:</span><span class="w"> </span><span class="n">ChildManager</span><span class="p">,</span><span class="w"> </span><span class="n">entryArgs</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">ABIArguments</span><span class="p">)</span><span class="w"> </span><span class="kt">anyerror</span><span class="o">!</span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">fd</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">entryArgs</span><span class="p">.</span><span class="n">nth</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">dataAddress</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">entryArgs</span><span class="p">.</span><span class="n">nth</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">dataLength</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">entryArgs</span><span class="p">.</span><span class="n">nth</span><span class="p">(</span><span class="mi">2</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Truncate some bytes</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">dataLength</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="mi">2</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">dataLength</span><span class="w"> </span><span class="o">-=</span><span class="w"> </span><span class="mi">2</span><span class="p">;</span> <span class="w"> </span><span class="n">entryArgs</span><span class="p">.</span><span class="n">setNth</span><span class="p">(</span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="n">dataLength</span><span class="p">);</span> <span class="w"> </span><span class="n">cm</span><span class="p">.</span><span class="n">setABIArguments</span><span class="p">(</span><span class="n">entryArgs</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">data</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">cm</span><span class="p">.</span><span class="n">childReadData</span><span class="p">(</span><span class="n">dataAddress</span><span class="p">,</span><span class="w"> </span><span class="n">dataLength</span><span class="p">);</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">deinit</span><span class="p">();</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;Got a write on {}: {s}</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="n">fd</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">items</span><span class="w"> </span><span class="p">});</span> <span class="w"> </span><span class="c1">// Handle syscall exit</span> <span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cm</span><span class="p">.</span><span class="n">childWaitForSyscall</span><span class="p">();</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">exitArgs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cm</span><span class="p">.</span><span class="n">getABIArguments</span><span class="p">();</span> <span class="w"> </span><span class="n">dataLength</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">exitArgs</span><span class="p">.</span><span class="n">nth</span><span class="p">(</span><span class="mi">2</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">dataLength</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="mi">2</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Force the writes to stop after the first one by returning EIO.</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">result</span><span class="o">:</span><span class="w"> </span><span class="kt">c_ulong</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="n">result</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">result</span><span class="w"> </span><span class="o">-%</span><span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">EIO</span><span class="p">;</span> <span class="w"> </span><span class="n">exitArgs</span><span class="p">.</span><span class="n">setResult</span><span class="p">(</span><span class="n">result</span><span class="p">);</span> <span class="w"> </span><span class="n">cm</span><span class="p">.</span><span class="n">setABIArguments</span><span class="p">(</span><span class="o">&amp;</span><span class="n">exitArgs</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>That's pretty neat!</p> <h3 id="next-steps">Next steps</h3><p>Short writes are just one of many bad IO interactions. Another fun one would be to completely buffer all writes on a file descriptor (not allowing anything to be written to disk at all) until fsync is called on the file descriptor. Or <a href="https://www.usenix.org/conference/atc20/presentation/rebello">forcing fsyncs to fail</a>.</p> <p>An interesting optimization would be to apply <a href="https://www.kernel.org/doc/Documentation/prctl/seccomp_filter.txt">seccomp filters</a> so that rather than paying a penalty for watching every syscall, I only get notified about the ones I have hooks for like <code>sys_write</code>. <a href="https://www.alfonsobeato.net/c/filter-and-modify-system-calls-with-seccomp-and-ptrace/">Here's another post</a> that explores ptrace with seccomp filters.</p> <p>Credits: Thank you Charlie Cummings and Paul Khuong for reviewing a draft of this post!</p> <h3 id="selected-responses-after-publication">Selected responses after publication</h3><ul> <li>oscooter on Reddit <a href="https://www.reddit.com/r/linux/comments/16x32l3/comment/k380m9q/?utm_source=reddit&amp;utm_medium=web2x&amp;context=3">gave some tips</a> on using ptrace, including using <code>process_vm_readv</code> instead of <code>PTRACE_PEEKDATA</code> to read memory from the tracee process.</li> </ul> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Fault injection is a scary-sounding term. Intercepting and modifying Linux system calls sounds scary too.<br><br>But it&#39;s a neat way to trigger logical errors in programs, to build confidence we wrote code correctly.<br><br>Let&#39;s trigger short writes to disk in Zig!<a href="https://t.co/0C3tWt3vtT">https://t.co/0C3tWt3vtT</a> <a href="https://t.co/OS7auDe8jR">pic.twitter.com/OS7auDe8jR</a></p>&mdash; Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1708482934863180004?ref_src=twsrc%5Etfw">October 1, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/2023-10-01-intercepting-and-modifying-linux-system-calls-with-ptrace.htmlSun, 01 Oct 2023 00:00:00 +0000How do databases execute expressions?http://notes.eatonphil.com/2023-09-21-how-do-databases-execute-expressions.html<p>Databases are fun. They sit at the confluence of Computer Science topics that might otherwise not seem practical in life as a developer. For example, every database with a query language is also a programming language implementation of some caliber. That doesn't include all databases though of course; see: RocksDB, FoundationDB, TigerBeetle, etc.</p> <p>This post looks at how various databases execute expressions in their query language.</p> <p>tldr; Most surveyed databases use a tree-walking interpreter. A few use stack- or register-based virtual machines. A couple have just-in-time compilers. And, tangentially, a few do vectorized interpretation.</p> <p class="note"> Throughout this post I'll use "virtual machine" as a shorthand for stack- or register-based loops that process a linearized set of instructions. I say this since it is sometimes fair to call a tree-walking interpreter a virtual machine. But that is not what I mean when I say virtual machine in this post. </p><h3 id="stepping-back">Stepping back</h3><p>Programming languages are typically implemented by turning an Abstract Syntax Tree (AST) into a linear set of instructions for a virtual machine (e.g. CPython, Java, C#) or native code (e.g. GCC's C compiler, Go, Rust). Some of the former implementations also generate and run Just-In-Time (JIT) compiled native code (e.g. Java and C#).</p> <p>Less commonly these days in programming languages does the implementation interpret off the AST or some other tree-like intermediate representation. This style is often called tree-walking.</p> <p>Shell languages sometimes do tree-walking. Otherwise, implementations that interpret directly off of a tree normally do so as a short-term measure before switching to compiled virtual machine code or JIT-ed native code (e.g. some JavaScript implementations, GraalVM, RPython, etc.)</p> <p>That is, while some major programming language implementations started out with tree-walking interpreters, they mostly moved away from solely tree-walking over a decade ago. See <a href="https://www.webkit.org/blog/189/announcing-squirrelfish/">JSC in 2008</a>, <a href="https://www.infoq.com/news/2007/12/ruby-19/">Ruby in 2007</a>, etc.</p> <p>My intuition is that tree-walking takes up more memory and is less cache-friendly than the linear instructions you give to a virtual machine or to your CPU. There are <a href="https://stefan-marr.de/downloads/oopsla23-larose-et-al-ast-vs-bytecode-interpreters-in-the-age-of-meta-compilation.pdf">some folks who disagree</a>, but they mostly talk about tree-walking when you've also got a JIT compiler hooked up. Which isn't quite the same thing. There has also been <a href="https://www.cs.cornell.edu/~asampson/blog/flattening.html">some early exploration and improvements</a> reported when tree-walking with a tree organized as an array.</p> <h4 id="and-databases?">And databases?</h4><p>Databases often interpret directly off a tree. (It isn't, generally speaking, fair to say they are AST-walking interpreters because databases typically transform and optimize beyond just an AST as parsed from user code.)</p> <p>But not all databases interpret a tree. Some have a virtual machine. And some generate and run JIT-ed native code.</p> <h3 id="methodology">Methodology</h3><p>If a core function (in the query execution path that does something like arithmetic or comparison) returns a value, that's a sign it's a tree-walking interpreter. Or, if you see code that is evaluating its arguments during execution, that's also a sign of a tree-walking interpreter.</p> <p>On the other hand, if the function mutates internal state such as by assigning a value to a context or pushing to a stack, that's a sign it's a stack- or register-based virtual machine. If a function pulls its arguments from memory and doesn't evaluate the arguments, that's also an indication it's a stack- or register-based virtual machine.</p> <p>This approach can result in false-positives depending on the architecture of the interpreter. User-defined functions (UDFs) would probably accept evaluated arguments and return a value regardless of how the interpreter is implemented. So it's important to find not just functions that could be implemented like UDFs, but core builtin behavior. Control flow implementations of functions like <code>if</code> or <code>case</code> can be great places to look.</p> <p>And tactically, I clone the source code and run stuff like <code>git grep -i eval | grep -v test | grep \\.java | grep -i eval</code> or <code>git grep -i expr | grep -v test | grep \\.go | grep -i expr</code> until I convince myself I'm somewhere interesting.</p> <p>Note: In talking about a broad swath of projects, maybe I've misunderstood one or some. If you've got a correction, let me know! If there's a proprietary database you work on where you can link to the (publicly described) execution strategy, feel free to pass it along! Or if I'm missing your public-source database in this list, send me a message!</p> <h3 id="survey">Survey</h3><h4><a href="https://github.com/cockroachdb/cockroach">Cockroach</a> (Ruling: Tree Walker)</h4><p>Judging by functions like <a href="https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/sem/eval/expr.go#L105"><code>func (e *evaluator) EvalBinaryExpr</code></a> that <a href="https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/sem/eval/expr.go#L106">evaluates the left-hand side</a> and then <a href="https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/sem/eval/expr.go#L113">evaluates the right-hand side</a> and returns a value, Cockroach looks like a tree walking interpreter.</p> <p>It gets a little more interesting though, since Cockroach also <a href="https://www.cockroachlabs.com/docs/stable/vectorized-execution">supports</a> vectorized expression execution. Vectorizing is a fancy term for acting on many pieces of data at once rather than one at a time. It doesn't necessarily imply SIMD. Here is an example of a <a href="https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/colexec/colexecproj/proj_non_const_ops.eg.go#L4427">vectorized addition</a> of two int64 columns.</p> <h4><a href="https://github.com/ClickHouse/clickhouse">ClickHouse</a> (Ruling: Tree Walker + JIT)</h4><p>The ClickHouse architecture is a little unique and difficult for me to read through – likely due to it being fairly mature, with serious optimization. But they tend to document their header files well. So files like <a href="https://github.com/ClickHouse/ClickHouse/blob/853e3f0aa789d5b6dcb251a403276d9fdc02902c/src/Functions/IFunction.h">src/Functions/IFunction.h</a> and <a href="https://github.com/ClickHouse/ClickHouse/blob/9af9b4a08542812694f171833a7afe08f5aaaafb/src/Interpreters/ExpressionActions.h">src/Interpreters/ExpressionActions.h</a> were helpful.</p> <p>They have also spoken publicly about their pipeline execution model; e.g. <a href="https://presentations.clickhouse.com/meetup24/5.%20Clickhouse%20query%20execution%20pipeline%20changes/">this presentation</a> and <a href="https://github.com/ClickHouse/ClickHouse/issues/34045">this roadmap issue</a>. But it isn't completely clear how much pipeline execution (which is broader than just expression evaluation) connects to expression evaluation.</p> <p>Moreover, they have <a href="https://clickhouse.com/blog/clickhouse-just-in-time-compiler-jit">publicly spoken</a> about their support for JIT compilation for query execution. But let's look at how execution works when the JIT is not enabled. For example, If we take a look at how <a href="https://github.com/ClickHouse/ClickHouse/blob/853e3f0aa789d5b6dcb251a403276d9fdc02902c/src/Functions/if.cpp"><code>if</code> is implemented</a>, we know that the <code>then</code> and <code>else</code> rows must be conditionally evaluated.</p> <p>In the runtime entrypoint, <a href="https://github.com/ClickHouse/ClickHouse/blob/853e3f0aa789d5b6dcb251a403276d9fdc02902c/src/Functions/if.cpp#L1048"><code>executeImpl</code></a>, we see the function call <a href="https://github.com/ClickHouse/ClickHouse/blob/853e3f0aa789d5b6dcb251a403276d9fdc02902c/src/Functions/if.cpp#L983"><code>executeShortCircuitArguments</code></a> which in turn calls <a href="https://github.com/ClickHouse/ClickHouse/blob/master/src/Columns/ColumnFunction.cpp#L280"><code>ColumnFunction::reduce()</code></a> which <a href="https://github.com/ClickHouse/ClickHouse/blob/master/src/Columns/ColumnFunction.cpp#L299">evaluates each column vector that is an argument</a> to the function and then calls execute on the function.</p> <p>So from this we can tell the non-JIT execution is a tree walker and that it is over <a href="https://twitter.com/ClickHouseDB/status/1705619463888900538">chunks of columns</a>, i.e. vectorized data, similar to Cockroach. However in ClickHouse execution is <em>always</em> over column vectors.</p> <p class="note"> In the original version of this post, I had some confusion about the ClickHouse execution strategy. Robert Schulze from ClickHouse <a href="https://clickhousedb.slack.com/archives/CUDSPUJ68/p1695307656700889">helped clarify</a> things for me. Thanks Robert! </p><h4><a href="https://github.com/duckdb/duckdb">DuckDB</a> (Ruling: Tree Walker)</h4><p>If we take a look at how <a href="https://github.com/duckdb/duckdb/blob/479c89e154f32012143d741c1a4f4d769f20044e/src/execution/expression_executor/execute_function.cpp#L59">function expressions are executed</a>, we can see each <a href="https://github.com/duckdb/duckdb/blob/479c89e154f32012143d741c1a4f4d769f20044e/src/execution/expression_executor/execute_function.cpp#L66">argument in the function being evaluated</a> before being passed to the actual function. So that looks like a tree walking interpreter.</p> <p>Like ClickHouse, DuckDB expression execution is always over column vectors. You can read more about this architecture <a href="https://duckdb.org/internals/vector.html">here</a> and <a href="https://www.infoq.com/articles/analytical-data-management-duckdb/">here</a>.</p> <h4><a href="https://github.com/influxdata/influxdb">Influx</a> (Ruling: Tree Walker)</h4><p>Influx originally had a SQL-like query language called InfluxQL. If we look at <a href="https://github.com/influxdata/influxdb/blob/b3b982d746fdc34451ca44d262f83b483cd9ea33/storage/reads/influxql_eval.go#L41">how it evaluates a binary expression</a>, it first evaluates the left-hand side and then the right-hand side before operating on the sides and returning a value. That's a tree-walking interpreter.</p> <p><a href="https://github.com/influxdata/flux">Flux</a> was the new query language for Influx. While the Flux <a href="https://github.com/influxdata/flux/blob/master/docs/VirtualMachine.md">docs</a> suggest they transform to an intermediate representation that is executed on a virtual machine, there's nothing I'm seeing that looks like a stack- or register-based virtual machine. All the <a href="https://github.com/influxdata/flux/blob/master/interpreter/interpreter.go#L352">evaluation functions</a> evaluate their arguments and return a value. That looks like a tree-walking interpreter to me.</p> <p>Today Influx <a href="https://www.influxdata.com/blog/the-plan-for-influxdb-3-0-open-source/">announced</a> that Flux is in maintenance mode and they are focusing on InfluxQL again.</p> <h4><a href="https://github.com/MariaDB/server">MariaDB</a> / <a href="https://github.com/mysql/mysql-server">MySQL</a> (Ruling: Tree Walker)</h4><p>Control flow methods are normally a good way to see how an interpreter is implemented. The implementation of COALESCE <a href="https://github.com/MariaDB/server/blob/e9573c059656d9477c2176f102f7e79d0f1ca6b0/sql/item_cmpfunc.cc#L3431">looks pretty simple</a>. We see it <a href="https://github.com/MariaDB/server/blob/e9573c059656d9477c2176f102f7e79d0f1ca6b0/sql/item_cmpfunc.cc#L3442">call <code>val_str()</code></a> for each argument to COALESCE. But I can only seem to find implementations of <code>val_str()</code> on raw values and not expressions. <code>Item_func_coalesce</code> itself does not implement <code>val_str()</code> for example, which would be a strong indication of a tree walker. Maybe it does implement <code>val_str()</code> through inheritance.</p> <p>It becomes a little clearer if we look at non-control flow methods like <a href="https://github.com/MariaDB/server/blob/e9573c059656d9477c2176f102f7e79d0f1ca6b0/sql/item_func.cc#L2048"><code>acos</code></a>. In this method we see <code>Item_func_acos</code> itself implement <code>val_real()</code> and also call <code>val_real()</code> on all its arguments. In this case it's obvious how the control flow of <code>acos(acos(.5))</code> would work. So that seems to indicate expressions are executed with a tree walking interpreter.</p> <p>I also noticed <a href="https://github.com/MariaDB/server/blob/e9573c059656d9477c2176f102f7e79d0f1ca6b0/sql/sp_instr.cc">sql/sp_instr.cc</a>. That is scary (in terms of invalidating my analysis) since it looks like a virtual machine. But after looking through it, I think this virtual machine only corresponds to how stored procedures are executed, hence the <code>sp_</code> prefix for Stored Programs. <a href="https://dev.mysql.com/doc/dev/mysql-server/latest/stored_programs.html">MySQL docs</a> also explain that stored procedures are executed with a bytecode virtual machine.</p> <p>I'm curious why they don't use that virtual machine for query execution.</p> <p>As far as I can tell MySQL and MariaDB do not differ in this regard.</p> <h4><a href="https://github.com/mongodb/mongo">MongoDB</a> (Ruling: Virtual Machine)</h4><p>Mongo <a href="https://laplab.me/posts/inside-new-query-engine-of-mongodb/">recently introduced</a> a virtual machine for executing queries, called Slot Based Execution (SBE). We can find the SBE code in <a href="https://github.com/mongodb/mongo/blob/master/src/mongo/db/exec/sbe/vm/vm.cpp#L9313">src/mongo/db/exec/sbe/vm/vm.cpp</a> and the main virtual machine entrypoint <a href="https://github.com/mongodb/mongo/blob/master/src/mongo/db/exec/sbe/vm/vm.cpp#L9313">here</a>. <a href="https://github.com/mongodb/mongo/blob/master/src/mongo/db/exec/sbe/vm/vm.cpp#L9419">Looks like</a> a classic stack-based virtual machine!</p> <p>It isn't completely clear to me if the SBE path is always used or if there are still cases where it falls back to their old execution model. You can read more about Mongo execution <a href="https://github.com/mongodb/mongo/blob/master/src/mongo/db/query/README.md">here</a> and <a href="https://www.mongodb.com/docs/manual/reference/sbe/">here</a>.</p> <h4><a hjef="https://github.com/postgres/postgres">PostgreSQL</a> (Ruling: Virtual Machine + JIT)</h4><p>The top of PostgreSQL's <a href="https://github.com/postgres/postgres/blob/cca97ce6a6653df7f4ec71ecd54944cc9a6c4c16/src/backend/executor/execExprInterp.c#L6">src/backend/executor/execExprInterp.c</a> clearly explains that expression execution uses a virtual machine. You see all the hallmarks: opcodes, a loop over a giant switch, etc. And if we look at how <a href="https://github.com/postgres/postgres/blob/cca97ce6a6653df7f4ec71ecd54944cc9a6c4c16/src/backend/executor/execExprInterp.c#L728">function expressions are executed</a>, we see another hallmark which is that the function expression code doesn't evaluate its arguments. They've already been evaluated. And function expression code just acts on the results of its arguments.</p> <p>PostgreSQL also <a href="https://github.com/postgres/postgres/blob/master/src/backend/jit/README">supports</a> JIT-ing expression execution. And we can find the switch between interpreting and JIT-compiling an expression <a href="https://github.com/postgres/postgres/blob/cca97ce6a6653df7f4ec71ecd54944cc9a6c4c16/src/backend/executor/execExpr.c#L873">here</a>.</p> <h4><a href="https://github.com/questdb/questdb">QuestDB</a> (Ruling: Tree Walker + JIT)</h4><p>QuestDB <a href="https://questdb.io/blog/2022/01/12/jit-sql-compiler/">wrote about their execution engine recently</a>. When the conditions are right, they'll <a href="https://github.com/questdb/questdb/blob/11ac85510292596f0d21b10603e500f8edb5e486/core/src/main/java/io/questdb/griffin/SqlCodeGenerator.java#L1394">switch over to a JIT compiler</a> and run native code.</p> <p>But let's look at the default path. For example, how <a href="https://github.com/questdb/questdb/blob/11ac85510292596f0d21b10603e500f8edb5e486/core/src/main/java/io/questdb/griffin/engine/functions/bool/AndFunctionFactory.java#L82"><code>AND</code> is implemented</a>. <code>AndBooleanFunction</code> implements <code>BooleanFunction</code> which implements <code>Function</code>. An expression can be evaluated by calling a <code>getX()</code> method on the expression type that implements <code>Function</code>. <code>AndBooleanFunction</code> calls <code>getBool()</code> on its left and right hand sides. And if we look at the <a href="https://github.com/questdb/questdb/blob/11ac85510292596f0d21b10603e500f8edb5e486/core/src/main/java/io/questdb/griffin/engine/functions/BooleanFunction.java#L35">partial implementation</a> of <code>BooleanFunction</code> we'll also see it doing <code>getX()</code> specific conversions during the call of <code>getX()</code>. So that's a tree-walking interpreter.</p> <h4><a href="https://github.com/scylladb/scylladb">Scylla</a> (Ruling: Tree Walker)</h4><p>If we take a look at how <a href="https://github.com/scylladb/scylladb/blob/08197882074227edbd0a95f49914913e3124753d/cql3/expr/expression.cc#L2145">functions are evaluated</a> in Scylla, we see function evaluation first <a href="https://github.com/scylladb/scylladb/blob/08197882074227edbd0a95f49914913e3124753d/cql3/expr/expression.cc#L2161">evaluating all of its arguments</a>. And the function evaluation function itself returns a <code>cql3::raw_value</code>. So that's a tree-walking interpreter.</p> <h4><a href="https://github.com/sqlite/sqlite">SQLite</a> (Ruling: Virtual Machine)</h4><p>SQLite's virtual machine is <a href="https://www.sqlite.org/opcode.html">comprehensive and well-documented</a>. It encompasses more than just expression evaluation but the entirety of query execution.</p> <p>We can find the massive virtual machine switch in <a href="https://github.com/sqlite/sqlite/blob/8aaf63c6ac8b8292c0ecead0d2b04b68e9e6be78/src/vdbe.c#L971">src/vdbe.c</a>.</p> <p>And if we look, for example, at how <code>AND</code> is implemented, we see it <a href="https://github.com/sqlite/sqlite/blob/8aaf63c6ac8b8292c0ecead0d2b04b68e9e6be78/src/vdbe.c#L2536">pulling its arguments out of memory</a> (already evaluated) and assigning the result back to <a href="https://github.com/sqlite/sqlite/blob/8aaf63c6ac8b8292c0ecead0d2b04b68e9e6be78/src/vdbe.c#L2545">a designated point in memory</a>.</p> <h4>SingleStore (Ruling: Virtual Machine + JIT)</h4><p>While there's no source code to link to, SingleStore <a href="https://www.youtube.com/watch?v=_vloWsdPCDs&amp;t=3810s">gave a talk at CMU</a> that broke down their query execution pipeline. Their <a href="https://docs.singlestore.com/cloud/query-data/advanced-query-topics/code-generation/">docs</a> also cover the topic.</p> <p><img src="/assets/memsql.webp" alt="SingleStore compiler pipeline"></p> <h4><a href="https://github.com/pingcap/tidb">TiDB</a> (Ruling: Tree Walker)</h4><p>Similar to DuckDB and ClickHouse, TiDB implements vectorized interpretation. They've <a href="https://www.pingcap.com/blog/10x-performance-improvement-for-expression-evaluation-made-possible-by-vectorized-execution/">written publicly about their switch to this method</a>.</p> <p>Let's take a look at how <code>if</code> is implemented in TiDB. There is a vectorized and non-vectorized version of <code>if</code> (in <a href="https://github.com/pingcap/tidb/blob/3ccd09e63addddeb0d33b5b87594a2d61fffd1d8/expression/builtin_control.go">expression/control_builtin.go</a> and <a href="https://github.com/pingcap/tidb/blob/3ccd09e63addddeb0d33b5b87594a2d61fffd1d8/expression/builtin_control_vec_generated.go">expression/control_builtin_generated.go</a> respectively). So maybe they haven't completely switched over to vectorized execution or maybe it can only be used in some conditions.</p> <p>If we look at the <a href="https://github.com/pingcap/tidb/blob/3ccd09e63addddeb0d33b5b87594a2d61fffd1d8/expression/builtin_control.go#L599">non-vectorized version of <code>if</code></a>, we see the <a href="https://github.com/pingcap/tidb/blob/3ccd09e63addddeb0d33b5b87594a2d61fffd1d8/expression/builtin_control.go#L600">condition evaluated</a>. And then the <code>then</code> or <code>else</code> is evaluated <a href="https://github.com/pingcap/tidb/blob/3ccd09e63addddeb0d33b5b87594a2d61fffd1d8/expression/builtin_control.go#L604">depending on the result of the condition</a>. That's a tree-walking interpreter.</p> <h3 id="conclusion">Conclusion</h3><p>As the DuckDB team <a href="https://duckdb.org/why_duckdb.html">points out</a>, vectorized interpretation or JIT compilation <a href="https://www.vldb.org/pvldb/vol11/p2209-kersten.pdf">seem like the future</a> for database expression execution. These strategies seem particularly important for analytics or time-series workloads. But vectorized interpretation seems to make the most sense for column-wise storage engines. And column-wise storage normally only makes sense for analytics workloads. Still, TiDB and Cockroach are transactional databases that also vectorize execution.</p> <p>And while SQLite and PostgreSQL use the virtual machine model, it's possible databases with tree-walking interpreters like Scylla and MySQL/MariaDB have decided there is not significant enough gains to be had (for transactional workloads) to justify the complexity of moving to a compiler + virtual machine architecture.</p> <p>Tree-walking interpreters and virtual machines are also independent from whether or not execution is vectorized. So that will be another interesting dimension to watch: if more databases move toward vectorized execution even if they don't adapt JIT compilation.</p> <p>Yet another alternative is that maybe as databases mature we'll see compilation tiers similar to what <a href="https://webkit.org/blog/9329/a-new-bytecode-format-for-javascriptcore/">browsers do</a> <a href="https://v8.dev/blog/sparkplug">with JavaScript</a>.</p> <p>Credits: Thanks Max Bernstein, Alex Miller, and Justin Jaffray for reviewing a draft version of this! And thanks to the #dbs channel on <a href="https://eatonphil.com/discord.html">Discord</a> for instigating this post!</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I spent some time looking into how various databases execute expressions in their query language.<br><br>Most of them have a tree-walking interpreter, some have a virtual machine, and some do just-in-time compilation.<br><br>Let&#39;s dig into some database code to see!<a href="https://t.co/BIGtHKh1X4">https://t.co/BIGtHKh1X4</a> <a href="https://t.co/nmhe9HmYw7">pic.twitter.com/nmhe9HmYw7</a></p>&mdash; Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1704936432412868725?ref_src=twsrc%5Etfw">September 21, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/2023-09-21-how-do-databases-execute-expressions.htmlThu, 21 Sep 2023 00:00:00 +0000Eight years of organizing tech meetupshttp://notes.eatonphil.com/eight-years-of-tech-meetups.html<p>This is a collection of random personal experiences. So if you don't want to read everything, feel free to skip to the end for takeaways.</p> <p>I write because I'd like to see more high-quality meetups. And maybe my little bit of experience will help someone out.</p> <h3 id="2015:-philadelphia">2015: Philadelphia</h3><p>I first tried to organize a meetup in Philly in 2015. I was contracting at the time and I figured a meetup might be a good way to source contracts or just meet interesting people. I created the "Philadelphia Software in Business" (or some other similarly vaguely named) group on Meetup.com.</p> <p>I didn't have any network; the first companies I worked for were not in Philly. But Meetup.com got me a few tens of people joining the group.</p> <p>My first challenge was finding a place to meet. I didn't know what I was doing so I looked at restaurants, bars, and cafes for dedicated event space. Needless to say, renting space was expensive on its own. And there was always an additional required minimum dollar spent per attendee.</p> <p>I ultimately found a place near the Schuylkill River. Maybe it was a community event space. Maybe I paid for it. I can't remember.</p> <p>The first and only time I hosted an event for the group, I got a surprising number of people for such a vague topic. There were maybe 6 of us. I was the youngest by far (I was 20), they were middle age. Excel users and one visionary type.</p> <p>There was no real point to the meetup and I didn't continue doing it.</p> <h3 id="2016---2017:-linode">2016 - 2017: Linode</h3><p>While I was at Linode, I organized "hack nights". I didn't ask for anyone's approval before starting it. I just said I'd be ordering pizza for anyone interested in staying after work to hack on Linode-related projects. I was willing to pay for the pizza, in part because I didn't want to risk being shut down by asking. But caker paid for it each time.</p> <p>I was nervous because people would show up and ask for pizza and not want to hack. It was company-provided under the aspiration of doing Linode-related work. Maybe I mentioned this or not. I can't remember. I'm pretty sure they got their pizza.</p> <p>Aside from myself, developers at Linode didn't really attend. The folks who attended were support staff or folks from the technical writing team who wanted more experience coding.</p> <p>I ran this for maybe 3 to 5 Wednesdays before not continuing. It was pretty fun! But staying after work for a few hours each Wednesday lost its charm.</p> <h4 id="book-club">Book Club</h4><p>Another time at Linode I started a book club. I was very torn about attempting to make the book club open to anyone in the area or just to Linode employees.</p> <p>I knew I'd probably get more people to attend if I made it public. But I wasn't sure if Linode would be cool with having external folks in the office. Before they moved to the Old City office, visitors weren't really a thing.</p> <p>So I made it private to Linode. And I started with the most obvious book for your average developer: Practical Common Lisp.</p> <p>I am pretty sure I learned one big trick by this time though. When I announced I'd be starting the book club I said something like this:</p> <blockquote><p>Hey folks! I'm thinking of starting a book club. A book I have in mind to start with is Practical Common Lisp. If I get at least one other person to join in then I'll move forward!</p> </blockquote> <p>I ended up getting two folks: one developer and one support staff member. We held the book club for 30 minutes once a week, covering one chapter each week. I was the only one who read anything I think, but the other two guys faithfully showed up for discussion.</p> <p>I didn't ask for permission to do this either. And this time we met during company time. I think it was 2-2:30PM.</p> <p>It was fun. We finished the book. But Practical Common Lisp probably wasn't a good choice. And I don't think I started a second book.</p> <h3 id="2017---2020:-false-starts">2017 - 2020: False starts</h3><p>I moved to NYC and joined a small startup (~20 employees). Linode was 100+ employees.</p> <p>We were in a WeWork so I considered starting a book club that was public to the WeWork. I had learned by then the law of numbers: I probably wouldn't get anyone from my company to join.</p> <p>I considered putting up posters around the WeWork to advertise. But in the end, I didn't end up going through with anything.</p> <p>I did present at a few meetups in NYC during this time. But I didn't organize anything.</p> <p>And then the pandemic hit and everything disappeared.</p> <h3 id="2021---2022:-virtual">2021 - 2022: Virtual</h3><p>In 2021 I started contracting again, thinking about starting a company. I wanted a community to be at the center.</p> <p>So I started a <a href="https://eatonphil.com/discord.html">Discord focused on software internals</a>.</p> <p>I had a bit more of a network at this point so I posted about the Discord on Twitter and got 100 likes or something and slowly started gaining folks in the Discord.</p> <p>I knew it was going to do better if I was pretty active in it so I made sure to post interesting blog posts at some regular interval. About compilers or databases or something.</p> <p>The Discord didn't turn out to help me out much in the starting-a-company front. Or I didn't use it effectively for that.</p> <p>I wanted more of an independent Discord of cool people who like to learn about systems internals. And that's what I got.</p> <p>This turned out to be ok though because I stopped working on that company and the Discord is still around and I still get to hang out with cool people.</p> <p>This Discord is still around and hit 1,700 members recently. Among other things, it has developers from many different database companies in it these days. They hang out and help out the noobs like me learn about database internals.</p> <p>I culled inactive members recently, so today the total is around 1,100.</p> <h4 id="hacker-nights">Hacker Nights</h4><p>During the pandemic I became frustrated that all the good meetups disappeared so I decided to start an online one that would be somewhat tied to the Discord and be about software internals.</p> <p>I would find 2 or 3 people to present for 10-20 minutes each on anything to do with software internals. We'd meet once a month at 8PM NY time I think.</p> <p>To get speakers I'd mostly DM people who I saw do interesting things on Twitter or Hacker News. I was lucky to have <a href="https://www.philipotoole.com/">Philip O'Toole</a> (author of rqlite), <a href="https://sirupsen.com/">Simon Eskildsen</a> (author of the Napkin Math blog), <a href="https://rsms.me/">Rasmus Andersson</a>, and many other excellent folks speak.</p> <p>You can find <a href="https://www.youtube.com/playlist?list=PL2t91m2Rvccpg2q2o_8lfuTYUhoP3AMwq">videos of these talks on YouTube</a>.</p> <p>The events were organized on Meetup.com. The group grew quickly and I'd have about 100 people RSVP to each event. 10-20 normally showed up.</p> <p>I'd post a Zoom link on Meetup.com. Sometimes Meetup.com crashed right as the meetup started, so no one could get a link. That was fun.</p> <p>On two different nights I had Zoom bombers show up and play crazy music or impersonate other members of the call and act weirdly (Zoom lets you change your name after you've joined the call).</p> <p>I learned a little bit about how to administrate a Zoom meeting.</p> <p>I ran Hacker Nights for 5 months. It was tiring to find speakers, tiring to deal with Zoom bombers. It was thankless and I wasn't really enjoying it.</p> <p>I was proud though that I was offering a channel for developers to learn about software internals of compilers, databases, etc. And it was great to meet many interesting speakers and attendees.</p> <h3 id="2023:-designing-data-intensive-applications">2023: Designing Data Intensive Applications</h3><p>A month ago I put out a call on Twitter for folks in NYC interested in reading through the book Designing Data Intensive Applications.</p> <p>I'd read the book before and while it was challenging, I knew it was immensely useful to any developer who works with data or an API.</p> <p>By this time I'd learned my second trick: not asking for public responses.</p> <p>I said something like:</p> <blockquote><p>Hey folks! I'm thinking of starting a book club meeting in Midtown NYC reading through Designing Data Intensive Applications. DM me if you'd be interested! If I get 2 other interested folks this will be on!</p> </blockquote> <p>I got maybe 40 DMs and 20 of them were based in NYC. Attendence thus would have been higher if I made the book club virtual. But virtual events take about as much effort as in-person events and somehow feel less rewarding. So I went through with the NYC group.</p> <p>I'm sure I could have gotten some company to provide us space, but this would just mean more negotation for me and tedium for everyone involved (bring your ID to be checked in, make sure you're registered, etc.).</p> <p>The group would meet every 2 weeks and cover 2 chapters at a time. We'd meet for 30 minutes. To avoid needing to find a place to meet, we'd meet in public at Bryant Park. (There turns out to be plenty of available seating on Fridays at 9AM in Bryant Park. When it rains we meet online.)</p> <p>I wanted to keep the overhead minimal and the timeline slightly aggressive. We'd be through the book in only 3 months. No crazy commitment.</p> <p>We've meet twice now and are 25% done the book. Attendance has been around 7 to 9 people each time so far, or a little less than 50%.</p> <p>They're almost all software developers, with one manager I think, who work for a variety of large and small tech companies.</p> <p>I'm loving it so far. And if it continues to go well, I'll probably continue running in-person book clubs.</p> <p>But it would only meet a few months a year, giving me a few month breaks from running it.</p> <h3 id="takeaways:-the-meh">Takeaways: The meh</h3><p>Organizing any event takes effort. Meetups are especially hard because you need to find a place to run the meetup, you probably want to provide food, and you need to find speakers.</p> <p>Often you can find a single place to host the meetup, but you have to constantly search for new speakers. Even one of the greatest meetups in NYC, Papers We Love, seems to be struggling to find speakers.</p> <p>The <a href="https://db.cs.cmu.edu/seminar2023/">CMU Database Group</a> and the <a href="http://charap.co/category/reading-group/">Distributed Systems Reading Group</a> seem to have the right idea though. They only run sessions part of the year, and they plan out all sessions in advance (including speakers).</p> <p>However, they are both virtual. And I'm not so interested in running virtual events anymore.</p> <h3 id="takeaways:-the-good">Takeaways: The good</h3><p>For one, meetups are an awesome way to meet random people and expand your network.</p> <p>Two, they're educational. Even beyond the content you are meeting about, there's the discussion alongside it you wouldn't get by yourself. And you, as organizer, get to pick the topic.</p> <p>These work out great for me. I love to meet people, and I love to learn.</p> <h4 id="tricks">Tricks</h4><p>Starting something new is embarassing because you're putting yourself out there. Maybe no one in your network shares your interests (to the degree or in the direction you do).</p> <p>My tricks are:</p> <ul> <li>Most importantly: keep things low key! Don't stress people out. Before learning and networking, the point of meetups is (or should be) fun.</li> <li>Saying you are "thinking about X" is a lightweight way to gauge interest. As compared to just saying you're "starting X", which gives you less room to back out if there turns out not to be interest.</li> <li>Asking people to DM you with interest is less embarrassing than asking for people to respond in public. Not everyone would want to respond in public. If there's interest in private, you can share the interest in public later on. But if you only ask for responses in public and there are no responses, that can feel embarassing.</li> <li>Indicating success criteria can help people understand how big you're thinking of. I'm normally fine with doing something as small as only two other people, so I say that. It's kind of like how Kickstarters work with minimum funding levels.</li> </ul> <p>These ideas apply to corporate planning too. I think about them when I'm sharing some new idea in company Slack as much as when I share on Twitter.</p> <p>A note on attendance rates: 10-20% actual attendance versus RSVP seems normal. If you get a higher percentage of people actually attending versus RSVP-ing you're doing pretty well!</p> <h4 id="finding-sponsors">Finding sponsors</h4><p>One final idea is about paying for space or paying for food. Companies with space and money for food are often willing to partner with folks willing to do the work to run an event.</p> <p>Running your own event in a company's space is advertising for them. They get to be associated with cool tech. It's a chance for them to pitch their open positions.</p> <p>Obviously this happens often when you start a meetup hosted by your own company. But you can also find other companies to host space.</p> <p>The kind of people to find to make this happen are senior developers or engineering managers, often on Twitter and sometimes on LinkedIn.</p> <p>I haven't done this myself yet because I'm not ready to commit to running a meetup. But I see it happen. And it's the approach I'd take if I were to run a real meetup again.</p> <p>Though now that I've got some time off there are a few talks I'd like to do myself.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Wrote a post on my experience organizing tech meetups of various stripes over the years. And a few things I&#39;ve learned.<br><br>&quot;meetups&quot; taken pretty broadly to include online communities, book clubs, and actual speaker events.<a href="https://t.co/xnd0LTneup">https://t.co/xnd0LTneup</a> <a href="https://t.co/w1oEaSNDHb">pic.twitter.com/w1oEaSNDHb</a></p>&mdash; Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1698793650036031753?ref_src=twsrc%5Etfw">September 4, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/eight-years-of-tech-meetups.htmlMon, 04 Sep 2023 00:00:00 +0000Thinking about functional programminghttp://notes.eatonphil.com/2023-08-15-thinking-about-functional-programming.html<p>Someone on Discord asked about how to learn functional programming.</p> <p>The question and my initial tweet on the subject prompted an interesting <a href="https://twitter.com/ShriramKMurthi/status/1691548254331092992">discussion</a> with <a href="https://twitter.com/ShriramKMurthi">Shriram Krishnamurthi</a> and other folks.</p> <p>So here's a slightly more thought out exploration.</p> <p>And just for backstory sake: I spent a few years a while ago <a href="https://github.com/eatonphil/ponyo">programming in Standard ML</a> and I wrote a <a href="https://github.com/eatonphil/bsdscheme">chunk of a Scheme implementation</a>. I'm not an expert, but I have a bit of background.</p> <p>Hey, this is a free opinion.</p> <h3 id="concepts-from-functional-programming">Concepts from functional programming</h3><p>When people talk about functional programming, I think of a few key choices you can make while programming:</p> <ul> <li>Immutability by default</li> <li>(Tail) recursion by default</li> <li>First-class functions (and the suite of tools that go along with it. e.g. map, reduce/fold)</li> </ul> <p>And if you have experience as a programmer, you either get the basic gist of these tenets or you can easily read about the basics.</p> <p>That is, while most programmers I've met understood the basics, most programmers I've met were not particularly <em>comfortable</em> or <em>fluent</em> expressing programs with these ideas.</p> <p>For myself, the only way I got comfortable expressing code with these ideas was lots of practice (as I mentioned above). And yet, even after I did a bunch of programming in Standard ML and Scheme, I really didn't see a particular benefit to practicing in a language other than one with which I wa already generally comfortable.</p> <p>You have to learn a lot of other random things when you pick up Scheme or Standard ML that aren't just: practice immutability by default, recursion, and first-class functions.</p> <p>So I think it's kind of misguided when person A asks how to learn functional programming and person B responds that they should learn Haskell or OCaml or whatever. I see this happen pretty often online.</p> <p class="note"> Beyond any "language for functional programming" as a recommendation in general, Haskell is a particularly egregious suggestion to make in my opinion because not only are you trying to practice functional programming tenets but you're also dealing with a complex type system and lazy evaluation. </p><p>Instead, <a href="https://notes.eatonphil.com/practicing-recursion.html">practice immutability, recursion, map/reduce</a> in whatever language you like.</p> <h3 id="programming-languages">Programming languages</h3><p>If you want to study programming languages, that's awesome. However, functional programming doesn't really have any direct connection to studying programming languages.</p> <p>Languages are all over the place. Scheme, Standard ML, and Haskell are worlds apart, even within the functional programming family.</p> <p>And modern languages have mostly adapted the aspects of functional programming that used to be unique 20 years ago.</p> <p>Moreover, there are many other worthwhile families of languages to learn about:</p> <ul> <li>Imperative/C-like (ok, you probably already know these)</li> <li>Stack-based (JVM, x86 assembly sort of, Forth)</li> <li>Array-oriented (APL, J)</li> <li>Declarative (CSS, SQL, TLA+, Prolog)</li> <li>Data (HTML, JSON, YAML)</li> <li>Proof assistants (Isabelle/HOL, Coq)</li> </ul> <p>The list isn't exhaustive, and the variations within families can be massive. But the point is that functional programming doesn't mean crazy programming languages or crazy programming ideas. Functional programming is a <em>subset</em> of crazy programming languages and crazy programming ideas.</p> <p>If you want to learn about crazy programming languages and crazy programming ideas, you should! Go for it!</p> <h3 id="introduction-to-computer-science">Introduction to Computer Science</h3><p>SICP is famous as the (former) introductory textbook for computer science at MIT, and for its use of Scheme and the <a href="https://en.wikipedia.org/wiki/Meta-circular_evaluator">Metacircular Evaluator</a>.</p> <p>I don't have any experience teaching beginners how to program so I don't have thoughts on if this made sense. That's for folks like <a href="https://twitter.com/ShriramKMurthi/status/1691548254331092992">Shriram</a> to think about.</p> <p>However, I'm a half-decent programmer and I can't make it through this book. If you liked the book or want to read it, that's great! But I <a href="https://notes.eatonphil.com/recommending-a-book.html">don't recommend</a> it to anyone.</p> <p>And many introductory Computer Science textbooks just don't make much sense to give to experienced programmers. For an experienced programmer, they can be quite slow!</p> <p>Most of the folks I see asking about how to learn functional programming are experienced programmers.</p> <h3 id="do-whatever-you-feel-like-doing">Do whatever you feel like doing</h3><p>I don't mean to overanalyze things, or get you overanalyzing things. If you want to learn functional programming by writing Haskell, that's awesome, you should go for it.</p> <p>Wanting to do something is basically the best motivation there is.</p> <p>The only reason I write this sort of post is so that folks who think that using Haskell or Standard ML or Scheme or reading SICP is the only way to learn functional programming see those ideas aren't necessarily true.</p> <h3 id="write-a-scheme!">Write a Scheme!</h3><p>Finally, for folks with time and motivation wanting to seriously work out their functional programming muscles, writing a Scheme implementation with a decent chunk of the standard library can be an immensely enjoyable project.</p> <p>You'll learn a lot about languages and compilers and algorithms and data structures. It's leetcode with meaning.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Wrote a short post on this idea about different things to think about when talking about learning functional programming<br><br>1. Core concepts (immutability, first-class functions, recursion)<br><br>2. Exploring programming languages<br><br>3. Teaching CS to students<a href="https://t.co/k4LzvnHbNs">https://t.co/k4LzvnHbNs</a></p>&mdash; Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1691617741764018430?ref_src=twsrc%5Etfw">August 16, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/2023-08-15-thinking-about-functional-programming.htmlTue, 15 Aug 2023 00:00:00 +0000We put a distributed database in the browser – and made a game of ithttp://notes.eatonphil.com/2023-07-11-we-put-a-distributed-database-in-the-browse.html<head> <meta http-equiv="refresh" content="4;URL='https://tigerbeetle.com/blog/2023-07-11-we-put-a-distributed-database-in-the-browser/'" /> </head><p>This is an external post of mine. Click <a href="https://tigerbeetle.com/blog/2023-07-11-we-put-a-distributed-database-in-the-browser/">here</a> if you are not redirected.</p> http://notes.eatonphil.com/2023-07-11-we-put-a-distributed-database-in-the-browse.htmlTue, 11 Jul 2023 00:00:00 +0000Metaprogramming in Zig and parsing CSShttp://notes.eatonphil.com/2023-06-19-metaprogramming-in-zig-and-parsing-css.html<p>I knew Zig supported some sort of reflection on types. But I had been confused about how to use it. What's the difference between <code>@typeInfo</code> and <code>@TypeOf</code>? I ignored this aspect of Zig until a problem came up at <a href="https://tigerbeetle.com">work</a> where reflection made sense.</p> <p>The situation was parsing and storing parsed fields in a struct. Each field name that is parsed should match up to a struct field.</p> <p>This is a fairly common problem. So this post walks through how to use Zig's metaprogramming features in a simpler but related domain: parsing CSS into typed objects, and pretty-printing these typed CSS objects.</p> <p>I live-streamed the implementation of this project yesterday on <a href="https://www.twitch.tv/eatonphil">Twitch</a>. The video is <a href="https://youtube.com/@eatonphil">available on YouTube</a>. And the source is <a href="https://github.com/eatonphil/zig-metaprogramming-css-parser">available on GitHub</a>.</p> <p>If you want to skip the parsing steps and just see the metaprogramming, jump to the implementation of <a href="#&lt;code&gt;match_property&lt;/code&gt;">match_property</a>.</p> <h3 id="parsing-css">Parsing CSS</h3><p>Let's imagine a CSS that only has alphabetical selectors, property names and values.</p> <p>The following would be valid:</p> <div class="highlight"><pre><span></span><span class="nt">div</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">background</span><span class="p">:</span><span class="w"> </span><span class="kc">black</span><span class="p">;</span> <span class="w"> </span><span class="k">color</span><span class="p">:</span><span class="w"> </span><span class="kc">white</span><span class="p">;</span> <span class="p">}</span> <span class="nt">a</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">color</span><span class="p">:</span><span class="w"> </span><span class="kc">blue</span><span class="p">;</span> <span class="p">}</span> </pre></div> <p>Thinking about the structure of this stripped down CSS we've got:</p> <ol> <li>CSS properties that consist of property names and values (in our case the property names are limited to <code>background</code> and <code>color</code>)</li> <li>CSS rules that have a selector and a list of rules</li> <li>CSS sheets that have a list of rules</li> </ol> <p>Turning that into Zig in <code>main.zig</code>:</p> <div class="highlight"><pre><span></span><span class="kr">const</span><span class="w"> </span><span class="n">std</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">&quot;std&quot;</span><span class="p">);</span> <span class="kr">const</span><span class="w"> </span><span class="n">CSSProperty</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">union</span><span class="p">(</span><span class="k">enum</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">unknown</span><span class="o">:</span><span class="w"> </span><span class="kt">void</span><span class="p">,</span> <span class="w"> </span><span class="n">color</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span> <span class="w"> </span><span class="n">background</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span> <span class="p">};</span> <span class="kr">const</span><span class="w"> </span><span class="n">CSSRule</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">selector</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span> <span class="w"> </span><span class="n">properties</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="n">CSSProperty</span><span class="p">,</span> <span class="p">};</span> <span class="kr">const</span><span class="w"> </span><span class="n">CSSSheet</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">rules</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="n">CSSRule</span><span class="p">,</span> <span class="p">};</span> </pre></div> <p>The parser is going to look for CSS rules which contain a selector and a list of CSS rules. The entrypoint is that simple:</p> <div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">parse</span><span class="p">(</span> <span class="w"> </span><span class="n">arena</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">std</span><span class="p">.</span><span class="n">heap</span><span class="p">.</span><span class="n">ArenaAllocator</span><span class="p">,</span> <span class="w"> </span><span class="n">css</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span> <span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="n">CSSSheet</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">index</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">rules</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="n">CSSRule</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">arena</span><span class="p">.</span><span class="n">allocator</span><span class="p">());</span> <span class="w"> </span><span class="c1">// Parse rules until EOF.</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">index</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">css</span><span class="p">.</span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">parse_rule</span><span class="p">(</span><span class="n">arena</span><span class="p">,</span><span class="w"> </span><span class="n">css</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">);</span> <span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">res</span><span class="p">.</span><span class="n">index</span><span class="p">;</span> <span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">rules</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">res</span><span class="p">.</span><span class="n">rule</span><span class="p">);</span> <span class="w"> </span><span class="c1">// In case there is trailing whitespace before the EOF,</span> <span class="w"> </span><span class="c1">// eating whitespace here makes sure we exit the loop</span> <span class="w"> </span><span class="c1">// immediately before trying to parse more rules.</span> <span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">eat_whitespace</span><span class="p">(</span><span class="n">css</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">CSSSheet</span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">rules</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rules</span><span class="p">.</span><span class="n">items</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> <span class="p">}</span> </pre></div> <p>Let's implement the <code>eat_whitespace</code> helper we've referenced. It increments a cursor into the css file while it sees whitespace.</p> <div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">eat_whitespace</span><span class="p">(</span> <span class="w"> </span><span class="n">css</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span> <span class="w"> </span><span class="n">initial_index</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="p">,</span> <span class="p">)</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">initial_index</span><span class="p">;</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">index</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">css</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ascii</span><span class="p">.</span><span class="n">isWhitespace</span><span class="p">(</span><span class="n">css</span><span class="p">[</span><span class="n">index</span><span class="p">]))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">index</span><span class="p">;</span> <span class="p">}</span> </pre></div> <p>In our stripped-down version of CSS all we have to think about is ASCII. So the builtin <code>std.ascii.isWhitespace()</code> function is perfect.</p> <p>Next, parsing CSS rules.</p> <h4 id="<code>parse_rule()</code>"><code>parse_rule()</code></h4><p>A rule consists of a selector, opening curly braces, any number of properties, and closing curly braces. We need to remember to eat whitespace between each piece of syntax.</p> <p>And we'll reference a few more parsing helpers we'll talk about next for the selector, braces, and properties.</p> <div class="highlight"><pre><span></span><span class="kr">const</span><span class="w"> </span><span class="n">ParseRuleResult</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">rule</span><span class="o">:</span><span class="w"> </span><span class="n">CSSRule</span><span class="p">,</span> <span class="w"> </span><span class="n">index</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="p">,</span> <span class="p">};</span> <span class="k">fn</span><span class="w"> </span><span class="n">parse_rule</span><span class="p">(</span> <span class="w"> </span><span class="n">arena</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">std</span><span class="p">.</span><span class="n">heap</span><span class="p">.</span><span class="n">ArenaAllocator</span><span class="p">,</span> <span class="w"> </span><span class="n">css</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span> <span class="w"> </span><span class="n">initial_index</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="p">,</span> <span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="n">ParseRuleResult</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">eat_whitespace</span><span class="p">(</span><span class="n">css</span><span class="p">,</span><span class="w"> </span><span class="n">initial_index</span><span class="p">);</span> <span class="w"> </span><span class="c1">// First parse selector(s).</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">selector_res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">parse_identifier</span><span class="p">(</span><span class="n">css</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">);</span> <span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">selector_res</span><span class="p">.</span><span class="n">index</span><span class="p">;</span> <span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">eat_whitespace</span><span class="p">(</span><span class="n">css</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Then parse opening curly brace: {.</span> <span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">parse_syntax</span><span class="p">(</span><span class="n">css</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">,</span><span class="w"> </span><span class="s">&#39;{&#39;</span><span class="p">);</span> <span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">eat_whitespace</span><span class="p">(</span><span class="n">css</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">);</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">properties</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="n">CSSProperty</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">arena</span><span class="p">.</span><span class="n">allocator</span><span class="p">());</span> <span class="w"> </span><span class="c1">// Then parse any number of properties.</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">index</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">css</span><span class="p">.</span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">eat_whitespace</span><span class="p">(</span><span class="n">css</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">index</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">css</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="n">css</span><span class="p">[</span><span class="n">index</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&#39;}&#39;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">attr_res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">parse_property</span><span class="p">(</span><span class="n">css</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">);</span> <span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">attr_res</span><span class="p">.</span><span class="n">index</span><span class="p">;</span> <span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">properties</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">attr_res</span><span class="p">.</span><span class="n">property</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">eat_whitespace</span><span class="p">(</span><span class="n">css</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Then parse closing curly brace: }.</span> <span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">parse_syntax</span><span class="p">(</span><span class="n">css</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">,</span><span class="w"> </span><span class="s">&#39;}&#39;</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">ParseRuleResult</span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">rule</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">CSSRule</span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">selector</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">selector_res</span><span class="p">.</span><span class="n">identifier</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">properties</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">properties</span><span class="p">.</span><span class="n">items</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">.</span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">index</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> <span class="p">}</span> </pre></div> <p>The <code>parse_syntax</code> helper is pretty simple, it does a bounds check and increments the cursor if the current character matches the one you pass in.</p> <div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">parse_syntax</span><span class="p">(</span> <span class="w"> </span><span class="n">css</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span> <span class="w"> </span><span class="n">initial_index</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="p">,</span> <span class="w"> </span><span class="n">syntax</span><span class="o">:</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span> <span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="kt">usize</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">initial_index</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">css</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="n">css</span><span class="p">[</span><span class="n">initial_index</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">syntax</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">initial_index</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">debug_at</span><span class="p">(</span><span class="n">css</span><span class="p">,</span><span class="w"> </span><span class="n">initial_index</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected syntax: &#39;{c}&#39;.&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">syntax</span><span class="p">});</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">error</span><span class="p">.</span><span class="n">NoSuchSyntax</span><span class="p">;</span> <span class="p">}</span> </pre></div> <p>This calls attention to debugging messages on failure. When we fail to parse a syntax, we want to give a useful error message and point at the exact line and column of code where the error happens.</p> <p>So let's implement <code>debug_at</code>.</p> <h4 id="<code>debug_at</code>"><code>debug_at</code></h4><p>First, we iterate over the entire CSS source code until we find the entire line that contains the index where the parser failed. We also want to identify the exact line and column corresponding to that index.</p> <div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">debug_at</span><span class="p">(</span> <span class="w"> </span><span class="n">css</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span> <span class="w"> </span><span class="n">index</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="p">,</span> <span class="w"> </span><span class="kr">comptime</span><span class="w"> </span><span class="n">msg</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span> <span class="w"> </span><span class="n">args</span><span class="o">:</span><span class="w"> </span><span class="n">anytype</span><span class="p">,</span> <span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">line_no</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">col_no</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">i</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">line_beginning</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">found_line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">false</span><span class="p">;</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">css</span><span class="p">.</span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">css</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="se">&#39;\n&#39;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">found_line</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">col_no</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="n">line_beginning</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="p">;</span> <span class="w"> </span><span class="n">line_no</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="k">continue</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">index</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">found_line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">true</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">found_line</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">col_no</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Then we print it all out in a nice format for users (which will likely just be ourselves).</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;Error at line {}, column {}. &quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="n">line_no</span><span class="p">,</span><span class="w"> </span><span class="n">col_no</span><span class="w"> </span><span class="p">});</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="n">msg</span><span class="w"> </span><span class="o">++</span><span class="w"> </span><span class="s">&quot;</span><span class="se">\n\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">args</span><span class="p">);</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;{s}</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">css</span><span class="p">[</span><span class="n">line_beginning</span><span class="p">..</span><span class="n">i</span><span class="p">]});</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">col_no</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">(</span><span class="n">col_no</span><span class="w"> </span><span class="o">-=</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot; &quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;^ Near here.</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span> <span class="p">}</span> </pre></div> <p>Ok, popping our mental stack, if we look back at <code>parse_rule</code> we still need to implement <code>parse_identifier</code> and <code>parse_property</code>.</p> <h4 id="<code>parse_identifier</code>"><code>parse_identifier</code></h4><p>An "identifier" for us here is just going to be an ASCII alphabetical string (i.e. <code>[a-zA-Z]+</code>). We're going to <em>really</em> simplify CSS because we're going to use this method for parsing not just selectors but property names and even property values.</p> <p>Zig again has a nice builtin <code>std.ascii.isAlphabetical</code> we can use.</p> <div class="highlight"><pre><span></span><span class="kr">const</span><span class="w"> </span><span class="n">ParseIdentifierResult</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">identifier</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span> <span class="w"> </span><span class="n">index</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="p">,</span> <span class="p">};</span> <span class="k">fn</span><span class="w"> </span><span class="n">parse_identifier</span><span class="p">(</span> <span class="w"> </span><span class="n">css</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span> <span class="w"> </span><span class="n">initial_index</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="p">,</span> <span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="n">ParseIdentifierResult</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">initial_index</span><span class="p">;</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">index</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">css</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ascii</span><span class="p">.</span><span class="n">isAlphabetic</span><span class="p">(</span><span class="n">css</span><span class="p">[</span><span class="n">index</span><span class="p">]))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">index</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">initial_index</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">debug_at</span><span class="p">(</span><span class="n">css</span><span class="p">,</span><span class="w"> </span><span class="n">initial_index</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected valid identifier.&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">error</span><span class="p">.</span><span class="n">InvalidIdentifier</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">ParseIdentifierResult</span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">identifier</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">css</span><span class="p">[</span><span class="n">initial_index</span><span class="p">..</span><span class="n">index</span><span class="p">],</span> <span class="w"> </span><span class="p">.</span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">index</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> <span class="p">}</span> </pre></div> <p>In reality, CSS properties are <a href="https://www.w3schools.com/cssref/css_selectors.php">highly complex</a>. Parsing CSS correctly isn't the main aim of this post though. :)</p> <h4 id="<code>parse_property</code>"><code>parse_property</code></h4><p>The final piece of CSS we need to parse is properties. These consist of a property name, then a colon, then a property value, and finally a semicolon. And within each piece we eat whitespace.</p> <div class="highlight"><pre><span></span><span class="kr">const</span><span class="w"> </span><span class="n">ParsePropertyResult</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">property</span><span class="o">:</span><span class="w"> </span><span class="n">CSSProperty</span><span class="p">,</span> <span class="w"> </span><span class="n">index</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="p">,</span> <span class="p">};</span> <span class="k">fn</span><span class="w"> </span><span class="n">parse_property</span><span class="p">(</span> <span class="w"> </span><span class="n">css</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span> <span class="w"> </span><span class="n">initial_index</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="p">,</span> <span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="n">ParsePropertyResult</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">eat_whitespace</span><span class="p">(</span><span class="n">css</span><span class="p">,</span><span class="w"> </span><span class="n">initial_index</span><span class="p">);</span> <span class="w"> </span><span class="c1">// First parse property name.</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">name_res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">parse_identifier</span><span class="p">(</span><span class="n">css</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="o">|</span><span class="n">e</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;Could not parse property name.</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">e</span><span class="p">;</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">name_res</span><span class="p">.</span><span class="n">index</span><span class="p">;</span> <span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">eat_whitespace</span><span class="p">(</span><span class="n">css</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Then parse colon: :.</span> <span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">parse_syntax</span><span class="p">(</span><span class="n">css</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">,</span><span class="w"> </span><span class="s">&#39;:&#39;</span><span class="p">);</span> <span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">eat_whitespace</span><span class="p">(</span><span class="n">css</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Then parse property value.</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">value_res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">parse_identifier</span><span class="p">(</span><span class="n">css</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="o">|</span><span class="n">e</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;Could not parse property value.</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">e</span><span class="p">;</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">value_res</span><span class="p">.</span><span class="n">index</span><span class="p">;</span> <span class="w"> </span><span class="c1">// Finally parse semi-colon: ;.</span> <span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">parse_syntax</span><span class="p">(</span><span class="n">css</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">,</span><span class="w"> </span><span class="s">&#39;;&#39;</span><span class="p">);</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">property</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">match_property</span><span class="p">(</span><span class="n">name_res</span><span class="p">.</span><span class="n">identifier</span><span class="p">,</span><span class="w"> </span><span class="n">value_res</span><span class="p">.</span><span class="n">identifier</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="o">|</span><span class="n">e</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">debug_at</span><span class="p">(</span><span class="n">css</span><span class="p">,</span><span class="w"> </span><span class="n">initial_index</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Unknown property: &#39;{s}&#39;.&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">name_res</span><span class="p">.</span><span class="n">identifier</span><span class="p">});</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">e</span><span class="p">;</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">ParsePropertyResult</span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">property</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">property</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">index</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> <span class="p">}</span> </pre></div> <p>Finally we get to the first bit of metaprogramming. Once we have a property name and value, we need to turn that into a Zig union.</p> <p>That's what <code>match_property()</code> is going to be responsible for doing.</p> <h3 id="<code>match_property</code>"><code>match_property</code></h3><p>This function needs to take a property name and value and return a <code>CSSProperty</code> with the correct field (matching up to the property name passed in) and assigned to the value passed in.</p> <p>If we didn't have metaprogramming or reflection, the implementation might look like this:</p> <div class="highlight"><pre><span></span><span class="n">fn</span><span class="w"> </span><span class="n">match_property</span><span class="p">(</span> <span class="w"> </span><span class="n">name</span><span class="p">:</span><span class="w"> </span><span class="p">[]</span><span class="k">const</span><span class="w"> </span><span class="n">u8</span><span class="p">,</span> <span class="w"> </span><span class="n">value</span><span class="p">:</span><span class="w"> </span><span class="p">[]</span><span class="k">const</span><span class="w"> </span><span class="n">u8</span><span class="p">,</span> <span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="n">CSSProperty</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="o">.</span><span class="n">mem</span><span class="o">.</span><span class="n">eql</span><span class="p">(</span><span class="n">u8</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="s2">&quot;color&quot;</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">CSSProperty</span><span class="p">{</span><span class="o">.</span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">value</span><span class="p">};</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="o">.</span><span class="n">mem</span><span class="o">.</span><span class="n">eql</span><span class="p">(</span><span class="n">u8</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="s2">&quot;background&quot;</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">CSSProperty</span><span class="p">{</span><span class="o">.</span><span class="n">background</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">value</span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">error</span><span class="o">.</span><span class="n">UnknownProperty</span><span class="p">;</span> <span class="p">}</span> </pre></div> <p>And that is not necessarily bad. In fact it may be how a lot of production code looks over time as product needs evolve. You can keep the internal field name unrelated to the external field name.</p> <p>However for the sake of learning, we'll try to implement the same thing with Zig metaprogramming.</p> <p>And specifically, we can take a look at <a href="https://github.com/ziglang/zig/blob/32cb9462ffa0a9df7a080d67eaf3a5762173f742/lib/std/json/static.zig">lib/std/json/static.zig</a> to understand the reflection APIs.</p> <p>Specifically, if we look at line 210-226 of that file, we can see them iterating over fields of a <code>Union</code>:</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="p">.</span><span class="n">Union</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">unionInfo</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="kr">comptime</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">meta</span><span class="p">.</span><span class="n">trait</span><span class="p">.</span><span class="n">hasFn</span><span class="p">(</span><span class="s">&quot;jsonParse&quot;</span><span class="p">)(</span><span class="n">T</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">T</span><span class="p">.</span><span class="n">jsonParse</span><span class="p">(</span><span class="n">allocator</span><span class="p">,</span><span class="w"> </span><span class="n">source</span><span class="p">,</span><span class="w"> </span><span class="n">options</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">unionInfo</span><span class="p">.</span><span class="n">tag_type</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">null</span><span class="p">)</span><span class="w"> </span><span class="nb">@compileError</span><span class="p">(</span><span class="s">&quot;Unable to parse into untagged union &#39;&quot;</span><span class="w"> </span><span class="o">++</span><span class="w"> </span><span class="nb">@typeName</span><span class="p">(</span><span class="n">T</span><span class="p">)</span><span class="w"> </span><span class="o">++</span><span class="w"> </span><span class="s">&quot;&#39;&quot;</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(.</span><span class="n">object_begin</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">source</span><span class="p">.</span><span class="n">next</span><span class="p">())</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">error</span><span class="p">.</span><span class="n">UnexpectedToken</span><span class="p">;</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">result</span><span class="o">:</span><span class="w"> </span><span class="o">?</span><span class="n">T</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">null</span><span class="p">;</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">name_token</span><span class="o">:</span><span class="w"> </span><span class="o">?</span><span class="n">Token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">source</span><span class="p">.</span><span class="n">nextAllocMax</span><span class="p">(</span><span class="n">allocator</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">alloc_if_needed</span><span class="p">,</span><span class="w"> </span><span class="n">options</span><span class="p">.</span><span class="n">max_value_len</span><span class="p">.</span><span class="o">?</span><span class="p">);</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">field_name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">name_token</span><span class="p">.</span><span class="o">?</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">inline</span><span class="w"> </span><span class="p">.</span><span class="n">string</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">allocated_string</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">slice</span><span class="o">|</span><span class="w"> </span><span class="n">slice</span><span class="p">,</span> <span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">error</span><span class="p">.</span><span class="n">UnexpectedToken</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="kr">inline</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">unionInfo</span><span class="p">.</span><span class="n">fields</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">u_field</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> </pre></div> <p>Then right after that (lines 226-243) we see them conditionally modifying the result object:</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">inline</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">unionInfo</span><span class="p">.</span><span class="n">fields</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">u_field</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">u_field</span><span class="p">.</span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="n">field_name</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Free the name token now in case we&#39;re using an allocator that optimizes freeing the last allocated object.</span> <span class="w"> </span><span class="c1">// (Recursing into parseInternal() might trigger more allocations.)</span> <span class="w"> </span><span class="n">freeAllocated</span><span class="p">(</span><span class="n">allocator</span><span class="p">,</span><span class="w"> </span><span class="n">name_token</span><span class="p">.</span><span class="o">?</span><span class="p">);</span> <span class="w"> </span><span class="n">name_token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">null</span><span class="p">;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">u_field</span><span class="p">.</span><span class="kt">type</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kt">void</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// void isn&#39;t really a json type, but we can support void payload union tags with {} as a value.</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(.</span><span class="n">object_begin</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">source</span><span class="p">.</span><span class="n">next</span><span class="p">())</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">error</span><span class="p">.</span><span class="n">UnexpectedToken</span><span class="p">;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(.</span><span class="n">object_end</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">source</span><span class="p">.</span><span class="n">next</span><span class="p">())</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">error</span><span class="p">.</span><span class="n">UnexpectedToken</span><span class="p">;</span> <span class="w"> </span><span class="n">result</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@unionInit</span><span class="p">(</span><span class="n">T</span><span class="p">,</span><span class="w"> </span><span class="n">u_field</span><span class="p">.</span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="p">{});</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Recurse.</span> <span class="w"> </span><span class="n">result</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@unionInit</span><span class="p">(</span><span class="n">T</span><span class="p">,</span><span class="w"> </span><span class="n">u_field</span><span class="p">.</span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">parseInternal</span><span class="p">(</span><span class="n">u_field</span><span class="p">.</span><span class="kt">type</span><span class="p">,</span><span class="w"> </span><span class="n">allocator</span><span class="p">,</span><span class="w"> </span><span class="n">source</span><span class="p">,</span><span class="w"> </span><span class="n">options</span><span class="p">));</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>We can see that the <code>.Union =&gt; |unionInfo|</code> condition is entered by switching on <code>@typeInfo(T)</code> (<a href="https://github.com/ziglang/zig/blob/32cb9462ffa0a9df7a080d67eaf3a5762173f742/lib/std/json/static.zig#L149">line 149</a>) and that <code>T</code> is a type (<a href="https://github.com/ziglang/zig/blob/32cb9462ffa0a9df7a080d67eaf3a5762173f742/lib/std/json/static.zig#L144">line 144</a>).</p> <p>We don't have a generic type though. We know we are working with a <code>CSSProperty</code>. And we know <code>CSSProperty</code> is a union so we don't need the <code>switch</code> either.</p> <p>So let's apply that to our <code>match_property</code> implementation.</p> <div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">match_property</span><span class="p">(</span> <span class="w"> </span><span class="n">name</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span> <span class="w"> </span><span class="n">value</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span> <span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="n">CSSProperty</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">cssPropertyInfo</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@typeInfo</span><span class="p">(</span><span class="n">CSSProperty</span><span class="p">);</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">cssPropertyInfo</span><span class="p">.</span><span class="n">Union</span><span class="p">.</span><span class="n">fields</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">u_field</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">u_field</span><span class="p">.</span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">@unionInit</span><span class="p">(</span><span class="n">CSSProperty</span><span class="p">,</span><span class="w"> </span><span class="n">u_field</span><span class="p">.</span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="n">value</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">error</span><span class="p">.</span><span class="n">UnknownProperty</span><span class="p">;</span> <span class="p">}</span> </pre></div> <p>And if we try to build that we'll get an error like this:</p> <div class="highlight"><pre><span></span><span class="n">main</span><span class="p">.</span><span class="n">zig</span><span class="o">:</span><span class="mi">15</span><span class="o">:</span><span class="mi">31</span><span class="o">:</span><span class="w"> </span><span class="k">error</span><span class="o">:</span><span class="w"> </span><span class="n">values</span><span class="w"> </span><span class="n">of</span><span class="w"> </span><span class="kt">type</span><span class="w"> </span><span class="err">&#39;</span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="n">builtin</span><span class="p">.</span><span class="n">Type</span><span class="p">.</span><span class="n">UnionField</span><span class="err">&#39;</span><span class="w"> </span><span class="n">must</span><span class="w"> </span><span class="n">be</span><span class="w"> </span><span class="kr">comptime</span><span class="o">-</span><span class="n">known</span><span class="p">,</span><span class="w"> </span><span class="n">but</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="n">is</span><span class="w"> </span><span class="n">runtime</span><span class="o">-</span><span class="n">known</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">cssPropertyInfo</span><span class="p">.</span><span class="n">Union</span><span class="p">.</span><span class="n">fields</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">u_field</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> </pre></div> <p>Zig's "reflection" abilities here are comptime only. So we can't use a runtime <code>for</code> loop, we must use a comptime <code>inline for</code> loop.</p> <div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">match_property</span><span class="p">(</span> <span class="w"> </span><span class="n">name</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span> <span class="w"> </span><span class="n">value</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span> <span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="n">CSSProperty</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">cssPropertyInfo</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@typeInfo</span><span class="p">(</span><span class="n">CSSProperty</span><span class="p">);</span> <span class="w"> </span><span class="kr">inline</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">cssPropertyInfo</span><span class="p">.</span><span class="n">Union</span><span class="p">.</span><span class="n">fields</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">u_field</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">u_field</span><span class="p">.</span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">@unionInit</span><span class="p">(</span><span class="n">CSSProperty</span><span class="p">,</span><span class="w"> </span><span class="n">u_field</span><span class="p">.</span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="n">value</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">error</span><span class="p">.</span><span class="n">UnknownProperty</span><span class="p">;</span> <span class="p">}</span> </pre></div> <p>As far as I understand it, this loop is basically unrolled and the generated code would look a lot like our hard-coded initial version.</p> <p>i.e. it would probably look like this:</p> <div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">match_property</span><span class="p">(</span> <span class="w"> </span><span class="n">name</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span> <span class="w"> </span><span class="n">value</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span> <span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="n">CSSProperty</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">cssPropertyInfo</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@typeInfo</span><span class="p">(</span><span class="n">CSSProperty</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;background&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">@unionInit</span><span class="p">(</span><span class="n">CSSProperty</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;background&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">value</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;color&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">@unionInit</span><span class="p">(</span><span class="n">CSSProperty</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;color&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">value</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;unknown&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">@unionInit</span><span class="p">(</span><span class="n">CSSProperty</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;unknown&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">value</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">error</span><span class="p">.</span><span class="n">UnknownProperty</span><span class="p">;</span> <span class="p">}</span> </pre></div> <p>Again that's just how I imagine the compiler to generate code from the Union field reflection and <code>inline for</code> over the fields.</p> <p>Try compiling that code. I get this:</p> <div class="highlight"><pre><span></span><span class="go">main.zig:17:58: error: expected type &#39;void&#39;, found &#39;[]const u8&#39;</span> <span class="go"> return @unionInit(CSSProperty, u_field.name, value);</span> </pre></div> <p>Thinking about the generated code makes it especially clear what's happening. We have an <code>unknown</code> field in there that has a <code>void</code> type. You can't assign a string to void.</p> <p>We know at runtime that the condition where that happens should be impossible because the user shouldn't enter <code>unknown</code> as a property name. (Though now that I write this, I see they actually could. But let's pretend they wouldn't.)</p> <p>So the problem isn't a runtime failure but a comptime type-checking failure.</p> <p>Thankfully we can work around this with comptime conditionals.</p> <p>If we wrap our current condition in an additional conditional that is evaluated at comptime and filters out the <code>unknown</code> pass of the <code>inline for</code> loop, the compiler shouldn't generate any code trying to assign to the <code>unknown</code> field.</p> <div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">match_property</span><span class="p">(</span> <span class="w"> </span><span class="n">name</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span> <span class="w"> </span><span class="n">value</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span> <span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="n">CSSProperty</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">cssPropertyInfo</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@typeInfo</span><span class="p">(</span><span class="n">CSSProperty</span><span class="p">);</span> <span class="w"> </span><span class="kr">inline</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">cssPropertyInfo</span><span class="p">.</span><span class="n">Union</span><span class="p">.</span><span class="n">fields</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">u_field</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="kr">comptime</span><span class="w"> </span><span class="o">!</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">u_field</span><span class="p">.</span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;unknown&quot;</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">u_field</span><span class="p">.</span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">@unionInit</span><span class="p">(</span><span class="n">CSSProperty</span><span class="p">,</span><span class="w"> </span><span class="n">u_field</span><span class="p">.</span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="n">value</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">error</span><span class="p">.</span><span class="n">UnknownProperty</span><span class="p">;</span> <span class="p">}</span> </pre></div> <p>And indeed, if you try to compile it, this works. Since the conditional is evaluated at compile time, we can imagine the code the compiler generates is this:</p> <div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">match_property</span><span class="p">(</span> <span class="w"> </span><span class="n">name</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span> <span class="w"> </span><span class="n">value</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span> <span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="n">CSSProperty</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">cssPropertyInfo</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@typeInfo</span><span class="p">(</span><span class="n">CSSProperty</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;background&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">@unionInit</span><span class="p">(</span><span class="n">CSSProperty</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;background&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">value</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;color&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">@unionInit</span><span class="p">(</span><span class="n">CSSProperty</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;color&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">value</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">error</span><span class="p">.</span><span class="n">UnknownProperty</span><span class="p">;</span> <span class="p">}</span> </pre></div> <p>The <code>unknown</code> field has been skipped.</p> <p>In retrospect, I realize that the <code>unknown</code> field probably isn't even needed. We could eliminate it from the <code>CSSProperty</code> union and get rid of that comptime conditional. However, sometimes there are in fact private fields you want to skip. And I wanted to show how to deal with that case.</p> <p>For the last bit of metaprogramming, let's talk about displaying the resulting <code>CSSSheet</code> we'd get after parsing.</p> <h3 id="<code>sheet.display()</code>"><code>sheet.display()</code></h3><p>If we didn't have metaprogramming and wanted to display the sheet, we'd have to switch on every possible union field.</p> <p>Like so (I've modified the <code>CSSSheet</code> struct definition so it includes this method):</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">display</span><span class="p">(</span><span class="n">sheet</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">CSSSheet</span><span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">sheet</span><span class="p">.</span><span class="n">rules</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">rule</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;selector: {s}</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">rule</span><span class="p">.</span><span class="n">selector</span><span class="p">});</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">rule</span><span class="p">.</span><span class="n">properties</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">property</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">property</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">unknown</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="k">unreachable</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">color</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">color_value</span><span class="o">|</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot; color: {s}</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">color_value</span><span class="p">}),</span> <span class="w"> </span><span class="p">.</span><span class="n">background</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">background_value</span><span class="o">|</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot; background: {s}</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">background_value</span><span class="p">}),</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>This is already a little annoying and could get unwieldy as we add fields to the <code>CSSProperty</code> union.</p> <p>Instead we can use the <code>inline for (@typeInfo(CSSProperty).Union.fields) |u_field|</code> method to iterate over all fields, skip the <code>unknown</code> field at comptime, and print out the field name and value by matching on the current value of the <code>property</code> enum by using the <code>@tagName</code> builtin.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">display</span><span class="p">(</span><span class="n">sheet</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">CSSSheet</span><span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">sheet</span><span class="p">.</span><span class="n">rules</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">rule</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;selector: {s}</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">rule</span><span class="p">.</span><span class="n">selector</span><span class="p">});</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">rule</span><span class="p">.</span><span class="n">properties</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">property</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">inline</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="nb">@typeInfo</span><span class="p">(</span><span class="n">CSSProperty</span><span class="p">).</span><span class="n">Union</span><span class="p">.</span><span class="n">fields</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">u_field</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="kr">comptime</span><span class="w"> </span><span class="o">!</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">u_field</span><span class="p">.</span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;unknown&quot;</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">u_field</span><span class="p">.</span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="nb">@tagName</span><span class="p">(</span><span class="n">property</span><span class="p">)))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot; {s}: {s}</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span> <span class="w"> </span><span class="nb">@tagName</span><span class="p">(</span><span class="n">property</span><span class="p">),</span> <span class="w"> </span><span class="nb">@field</span><span class="p">(</span><span class="n">property</span><span class="p">,</span><span class="w"> </span><span class="n">u_field</span><span class="p">.</span><span class="n">name</span><span class="p">),</span> <span class="w"> </span><span class="p">});</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> </pre></div> <h3 id="<code>main</code>"><code>main</code></h3><p>Finally, we pull it all together with a little <code>main</code> function.</p> <div class="highlight"><pre><span></span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">main</span><span class="p">()</span><span class="w"> </span><span class="o">!</span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">arena</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">heap</span><span class="p">.</span><span class="n">ArenaAllocator</span><span class="p">.</span><span class="n">init</span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">heap</span><span class="p">.</span><span class="n">page_allocator</span><span class="p">);</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">arena</span><span class="p">.</span><span class="n">deinit</span><span class="p">();</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">allocator</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">arena</span><span class="p">.</span><span class="n">allocator</span><span class="p">();</span> <span class="w"> </span><span class="c1">// Let&#39;s read in a CSS file.</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">args</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">process</span><span class="p">.</span><span class="n">args</span><span class="p">();</span> <span class="w"> </span><span class="c1">// Skips the program name.</span> <span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">args</span><span class="p">.</span><span class="n">next</span><span class="p">();</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">file_name</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">args</span><span class="p">.</span><span class="n">next</span><span class="p">())</span><span class="w"> </span><span class="o">|</span><span class="n">f</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">file_name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">f</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">file</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">fs</span><span class="p">.</span><span class="n">cwd</span><span class="p">().</span><span class="n">openFile</span><span class="p">(</span><span class="n">file_name</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">file</span><span class="p">.</span><span class="n">close</span><span class="p">();</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">file_size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">file</span><span class="p">.</span><span class="n">getEndPos</span><span class="p">();</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">css_file</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">allocator</span><span class="p">.</span><span class="n">alloc</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">file_size</span><span class="p">);</span> <span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">file</span><span class="p">.</span><span class="n">read</span><span class="p">(</span><span class="n">css_file</span><span class="p">);</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">sheet</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">parse</span><span class="p">(</span><span class="o">&amp;</span><span class="n">arena</span><span class="p">,</span><span class="w"> </span><span class="n">css_file</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="n">sheet</span><span class="p">.</span><span class="n">display</span><span class="p">();</span> <span class="p">}</span> </pre></div> <p>And try it against some tests.</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>zig<span class="w"> </span>build-exe<span class="w"> </span>main.zig <span class="gp">$ </span>cat<span class="w"> </span>tests/basic.css <span class="go">div {</span> <span class="go"> background: white;</span> <span class="go">}</span> <span class="gp">$ </span>./main<span class="w"> </span>tests/basic.css <span class="go">selector: div</span> <span class="go"> background: white</span> </pre></div> <p>Nice! Let's try it against a more complex test.</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>cat<span class="w"> </span>tests/multiple-blocks.css <span class="go">div {</span> <span class="go"> background: black;</span> <span class="go"> color: white;</span> <span class="go">}</span> <span class="go">a {</span> <span class="go"> color: blue;</span> <span class="go">}</span> <span class="gp">$ </span>./main<span class="w"> </span>tests/multiple-blocks.css <span class="go">selector: div</span> <span class="go"> background: black</span> <span class="go"> color: white</span> <span class="go">selector: a</span> <span class="go"> color: blue</span> </pre></div> <p>Awesome. And against a bad CSS sheet:</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>cat<span class="w"> </span>tests/bad-property.css <span class="go">a {</span> <span class="go"> big: pink;</span> <span class="go">}</span> <span class="gp">$ </span>./main<span class="w"> </span>cat<span class="w"> </span>tests/bad-property.css <span class="go">Error at line 2, column 4. Unknown property: &#39;big&#39;.</span> <span class="go"> big: pink;</span> <span class="go"> ^ Near here.</span> </pre></div> <p>We've got it!</p> <h3 id="addendum:-<code>@field</code>">Addendum: <code>@field</code></h3><p>The docs were quite clear about using <code>@field(object, fieldName)</code> to access the value of an <code>object</code> of type <code>@TypeOf(object)</code> at field <code>fieldName</code>.</p> <p>And the docs do mention <code>@field()</code> can be used as LHS but that only really struct me when I was browsing the Zig JSON code like at <a href="https://github.com/ziglang/zig/blob/master/lib/std/json/static.zig#L307">line 307</a>.</p> <p>I didn't use that in this little project but I've used it elsewhere, so it I wanted to call this LHS behavior out.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Wrote a short post on parsing CSS as a way to motivate some basic exploration of metaprogramming in Zig.<br><br>I heavily referenced Zig&#39;s builtin JSON parser when learning this. And it is referenced multiple times in the post as well.<a href="https://t.co/CX6jXSLGiR">https://t.co/CX6jXSLGiR</a> <a href="https://t.co/jAJJZ0pONQ">pic.twitter.com/jAJJZ0pONQ</a></p>&mdash; Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1670868544953647129?ref_src=twsrc%5Etfw">June 19, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/2023-06-19-metaprogramming-in-zig-and-parsing-css.htmlMon, 19 Jun 2023 00:00:00 +0000Implementing the Raft distributed consensus protocol in Gohttp://notes.eatonphil.com/2023-05-25-raft.html<p>As part of bringing myself up-to-speed after joining <a href="https://tigerbeetle.com/">TigerBeetle</a>, I wanted some background on how distributed consensus and replicated state machines protocols work. TigerBeetle uses <a href="https://pmg.csail.mit.edu/papers/vr-revisited.pdf">Viewstamped Replication</a>. But I wanted to understand all popular protocols and I decided to start with <a href="https://raft.github.io/">Raft</a>.</p> <p>We'll implement two key components of Raft in this post (leader election and log replication). Around 1k lines of Go. It took me around 7 months of sporadic studying to come to (what I hope is) an understanding of the basics.</p> <p><strong>Disclaimer</strong>: I'm not an expert. My implementation isn't yet hooked up to <a href="https://github.com/jepsen-io/jepsen">Jepsen</a>. I've run it through a mix of <a href="https://github.com/eatonphil/goraft/tree/main#distributed-key-value-store-api">manual</a> and <a href="https://github.com/eatonphil/goraft/tree/main/cmd/stress">automated tests</a> and it seems generally correct. This is not intended to be used in production. It's just for my education.</p> <p>All code for this project is <a href="https://github.com/eatonphil/goraft">available on GitHub</a>.</p> <p>Let's dig in!</p> <h3 id="the-algorithm">The algorithm</h3><p><a href="https://raft.github.io/raft.pdf">The Raft paper</a> itself is quite readable. Give it a read and you'll get the basic idea.</p> <p>The gist is that nodes in a cluster conduct elections to pick a leader. Users of the Raft cluster send messages to the leader. The leader passes the message to followers and waits for a majority to store the message. Once the message is committed (majority consensus has been reached), the message is applied to a state machine the user supplies. Followers learn about the latest committed message from the leader and apply each new committed message to their local user-supplied state machine.</p> <p>There's more to it including reconfiguration and snapshotting, which I won't get into in this post. But you can get the gist of Raft by thinking about 1) leader election and 2) replicated logs powering replicated state machines.</p> <h3 id="modeling-with-state-machines-and-key-value-stores">Modeling with state machines and key-value stores</h3><p>I've written before about how you can <a href="https://notes.eatonphil.com/minimal-key-value-store-with-hashicorp-raft.html">build a key-value store on top of Raft</a>. How you can <a href="https://notes.eatonphil.com/zigrocks-sql.html">build a SQL database on top of a key-value store</a>. And how you can build a <a href="https://notes.eatonphil.com/distributed-postgres.html">distributed SQL database on top of Raft</a>.</p> <p>This post will start quite similarly to that first post except for that we won't stop at the Raft layer.</p> <h3 id="a-distributed-key-value-store">A distributed key-value store</h3><p>To build on top of the Raft library we'll build, we need to create a state machine and commands that are sent to the state machine.</p> <p>Our state machine will have two operations: get a value from a key, and set a key to a value.</p> <p>This will go in <code>cmd/kvapi/main.go</code>.</p> <div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span> <span class="kn">import</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="s">&quot;bytes&quot;</span> <span class="w"> </span><span class="nx">crypto</span><span class="w"> </span><span class="s">&quot;crypto/rand&quot;</span> <span class="w"> </span><span class="s">&quot;encoding/binary&quot;</span> <span class="w"> </span><span class="s">&quot;fmt&quot;</span> <span class="w"> </span><span class="s">&quot;log&quot;</span> <span class="w"> </span><span class="s">&quot;math/rand&quot;</span> <span class="w"> </span><span class="s">&quot;net/http&quot;</span> <span class="w"> </span><span class="s">&quot;os&quot;</span> <span class="w"> </span><span class="s">&quot;strconv&quot;</span> <span class="w"> </span><span class="s">&quot;strings&quot;</span> <span class="w"> </span><span class="s">&quot;sync&quot;</span> <span class="w"> </span><span class="s">&quot;github.com/eatonphil/goraft&quot;</span> <span class="p">)</span> <span class="kd">type</span><span class="w"> </span><span class="nx">statemachine</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">db</span><span class="w"> </span><span class="o">*</span><span class="nx">sync</span><span class="p">.</span><span class="nx">Map</span> <span class="w"> </span><span class="nx">server</span><span class="w"> </span><span class="kt">int</span> <span class="p">}</span> <span class="kd">type</span><span class="w"> </span><span class="nx">commandKind</span><span class="w"> </span><span class="kt">uint8</span> <span class="kd">const</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="nx">setCommand</span><span class="w"> </span><span class="nx">commandKind</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">iota</span> <span class="w"> </span><span class="nx">getCommand</span> <span class="p">)</span> <span class="kd">type</span><span class="w"> </span><span class="nx">command</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">kind</span><span class="w"> </span><span class="nx">commandKind</span> <span class="w"> </span><span class="nx">key</span><span class="w"> </span><span class="kt">string</span> <span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="kt">string</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">statemachine</span><span class="p">)</span><span class="w"> </span><span class="nx">Apply</span><span class="p">(</span><span class="nx">cmd</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="p">)</span><span class="w"> </span><span class="p">([]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">decodeCommand</span><span class="p">(</span><span class="nx">cmd</span><span class="p">)</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">setCommand</span><span class="p">:</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">db</span><span class="p">.</span><span class="nx">Store</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">key</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">value</span><span class="p">)</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">getCommand</span><span class="p">:</span> <span class="w"> </span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">db</span><span class="p">.</span><span class="nx">Load</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">key</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Key not found&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">[]</span><span class="nb">byte</span><span class="p">(</span><span class="nx">value</span><span class="p">.(</span><span class="kt">string</span><span class="p">)),</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="k">default</span><span class="p">:</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Unknown command: %x&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">cmd</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <p>But the Raft library we'll build needs to deal with various state machines. So commands passed from the user into the Raft cluster must be serialized to bytes.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">encodeCommand</span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="nx">command</span><span class="p">)</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">msg</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">bytes</span><span class="p">.</span><span class="nx">NewBuffer</span><span class="p">(</span><span class="kc">nil</span><span class="p">)</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">msg</span><span class="p">.</span><span class="nx">WriteByte</span><span class="p">(</span><span class="nb">uint8</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">kind</span><span class="p">))</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">Write</span><span class="p">(</span><span class="nx">msg</span><span class="p">,</span><span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">LittleEndian</span><span class="p">,</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">key</span><span class="p">)))</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">msg</span><span class="p">.</span><span class="nx">WriteString</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">key</span><span class="p">)</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">Write</span><span class="p">(</span><span class="nx">msg</span><span class="p">,</span><span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">LittleEndian</span><span class="p">,</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">value</span><span class="p">)))</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">msg</span><span class="p">.</span><span class="nx">WriteString</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">value</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">msg</span><span class="p">.</span><span class="nx">Bytes</span><span class="p">()</span> <span class="p">}</span> </pre></div> <p>And the <code>Apply()</code> function from above needs to be able to decode the bytes:</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">decodeCommand</span><span class="p">(</span><span class="nx">msg</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="p">)</span><span class="w"> </span><span class="nx">command</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="nx">command</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">commandKind</span><span class="p">(</span><span class="nx">msg</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span> <span class="w"> </span><span class="nx">keyLen</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">LittleEndian</span><span class="p">.</span><span class="nx">Uint64</span><span class="p">(</span><span class="nx">msg</span><span class="p">[</span><span class="mi">1</span><span class="p">:</span><span class="mi">9</span><span class="p">])</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">key</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">msg</span><span class="p">[</span><span class="mi">9</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="mi">9</span><span class="o">+</span><span class="nx">keyLen</span><span class="p">])</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">setCommand</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">valLen</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">LittleEndian</span><span class="p">.</span><span class="nx">Uint64</span><span class="p">(</span><span class="nx">msg</span><span class="p">[</span><span class="mi">9</span><span class="o">+</span><span class="nx">keyLen</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="mi">9</span><span class="o">+</span><span class="nx">keyLen</span><span class="o">+</span><span class="mi">8</span><span class="p">])</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">value</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">msg</span><span class="p">[</span><span class="mi">9</span><span class="o">+</span><span class="nx">keyLen</span><span class="o">+</span><span class="mi">8</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="mi">9</span><span class="o">+</span><span class="nx">keyLen</span><span class="o">+</span><span class="mi">8</span><span class="o">+</span><span class="nx">valLen</span><span class="p">])</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">c</span> <span class="p">}</span> </pre></div> <h4 id="http-api">HTTP API</h4><p>Now that we've modeled the key-value store as a state machine. Let's build the HTTP endpoints that allow the user to operate the state machine through the Raft cluster.</p> <p>First, let's implement the <code>set</code> operation. We need to grab the key and value the user passes in and call <code>Apply()</code> on the Raft cluster. Calling <code>Apply()</code> on the Raft cluster will eventually call the <code>Apply()</code> function we just wrote, but not until the message sent to the Raft cluster is actually replicated.</p> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">httpServer</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">raft</span><span class="w"> </span><span class="o">*</span><span class="nx">goraft</span><span class="p">.</span><span class="nx">Server</span> <span class="w"> </span><span class="nx">db</span><span class="w"> </span><span class="o">*</span><span class="nx">sync</span><span class="p">.</span><span class="nx">Map</span> <span class="p">}</span> <span class="c1">// Example:</span> <span class="c1">//</span> <span class="c1">// curl http://localhost:2020/set?key=x&amp;value=1</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">hs</span><span class="w"> </span><span class="nx">httpServer</span><span class="p">)</span><span class="w"> </span><span class="nx">setHandler</span><span class="p">(</span><span class="nx">w</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">ResponseWriter</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Request</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="nx">command</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">setCommand</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">key</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">URL</span><span class="p">.</span><span class="nx">Query</span><span class="p">().</span><span class="nx">Get</span><span class="p">(</span><span class="s">&quot;key&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">value</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">URL</span><span class="p">.</span><span class="nx">Query</span><span class="p">().</span><span class="nx">Get</span><span class="p">(</span><span class="s">&quot;value&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">hs</span><span class="p">.</span><span class="nx">raft</span><span class="p">.</span><span class="nx">Apply</span><span class="p">([][]</span><span class="kt">byte</span><span class="p">{</span><span class="nx">encodeCommand</span><span class="p">(</span><span class="nx">c</span><span class="p">)})</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">&quot;Could not write key-value: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">Error</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusText</span><span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusBadRequest</span><span class="p">),</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusBadRequest</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>To reiterate, we tell the Raft cluster we want this message replicated. The message contains the operation type (<code>set</code>) and the operation details (<code>key</code> and <code>value</code>). These messages are custom to the state machine we wrote. And they will be interpreted by the state machine we wrote, on each node in the cluster.</p> <p>Next we handle <code>get</code>-ing values from the cluster. There are two ways to do this. We already embed a local copy of the distributed key-value map. We could just read from that map in the current process. But it might not be up-to-date or correct. It would be fast to read though. And convenient for debugging.</p> <p>But the only <a href="https://github.com/etcd-io/etcd/issues/741"><em>correct</em> way to read from a Raft cluster</a> is to pass the read through the log replication too.</p> <p>So we'll support both.</p> <div class="highlight"><pre><span></span><span class="c1">// Example:</span> <span class="c1">//</span> <span class="c1">// curl http://localhost:2020/get?key=x</span> <span class="c1">// 1</span> <span class="c1">// curl http://localhost:2020/get?key=x&amp;relaxed=true # Skips consensus for the read.</span> <span class="c1">// 1</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">hs</span><span class="w"> </span><span class="nx">httpServer</span><span class="p">)</span><span class="w"> </span><span class="nx">getHandler</span><span class="p">(</span><span class="nx">w</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">ResponseWriter</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Request</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="nx">command</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">getCommand</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">key</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">URL</span><span class="p">.</span><span class="nx">Query</span><span class="p">().</span><span class="nx">Get</span><span class="p">(</span><span class="s">&quot;key&quot;</span><span class="p">)</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="kt">error</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">URL</span><span class="p">.</span><span class="nx">Query</span><span class="p">().</span><span class="nx">Get</span><span class="p">(</span><span class="s">&quot;relaxed&quot;</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;true&quot;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">v</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">hs</span><span class="p">.</span><span class="nx">db</span><span class="p">.</span><span class="nx">Load</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">key</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Key not found&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="p">[]</span><span class="nb">byte</span><span class="p">(</span><span class="nx">v</span><span class="p">.(</span><span class="kt">string</span><span class="p">))</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">results</span><span class="w"> </span><span class="p">[]</span><span class="nx">goraft</span><span class="p">.</span><span class="nx">ApplyResult</span> <span class="w"> </span><span class="nx">results</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">hs</span><span class="p">.</span><span class="nx">raft</span><span class="p">.</span><span class="nx">Apply</span><span class="p">([][]</span><span class="kt">byte</span><span class="p">{</span><span class="nx">encodeCommand</span><span class="p">(</span><span class="nx">c</span><span class="p">)})</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">results</span><span class="p">)</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Expected single response from Raft, got: %d.&quot;</span><span class="p">,</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">results</span><span class="p">))</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">results</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="nx">Error</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">results</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="nx">Error</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">results</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="nx">Result</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">&quot;Could not encode key-value in http response: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">Error</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusText</span><span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusInternalServerError</span><span class="p">),</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusInternalServerError</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">written</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">written</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">value</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">n</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">w</span><span class="p">.</span><span class="nx">Write</span><span class="p">(</span><span class="nx">value</span><span class="p">[</span><span class="nx">written</span><span class="p">:])</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">&quot;Could not encode key-value in http response: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">Error</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusText</span><span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusInternalServerError</span><span class="p">),</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusInternalServerError</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">written</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="nx">n</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <h4 id="main">Main</h4><p>Now that we've set up our custom state machine and our HTTP API for interacting with the Raft cluster, we'll tie it together with reading configuration from the command-line and actually starting the Raft node and the HTTP API.</p> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">config</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">cluster</span><span class="w"> </span><span class="p">[]</span><span class="nx">goraft</span><span class="p">.</span><span class="nx">ClusterMember</span> <span class="w"> </span><span class="nx">index</span><span class="w"> </span><span class="kt">int</span> <span class="w"> </span><span class="nx">id</span><span class="w"> </span><span class="kt">string</span> <span class="w"> </span><span class="nx">address</span><span class="w"> </span><span class="kt">string</span> <span class="w"> </span><span class="nx">http</span><span class="w"> </span><span class="kt">string</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">getConfig</span><span class="p">()</span><span class="w"> </span><span class="nx">config</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">cfg</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">config</span><span class="p">{}</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">node</span><span class="w"> </span><span class="kt">string</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">arg</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">arg</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;--node&quot;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="kt">error</span> <span class="w"> </span><span class="nx">node</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="nx">i</span><span class="o">+</span><span class="mi">2</span><span class="p">]</span> <span class="w"> </span><span class="nx">cfg</span><span class="p">.</span><span class="nx">index</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">strconv</span><span class="p">.</span><span class="nx">Atoi</span><span class="p">(</span><span class="nx">node</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="s">&quot;Expected $value to be a valid integer in `--node $value`, got: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">node</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">i</span><span class="o">++</span> <span class="w"> </span><span class="k">continue</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">arg</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;--http&quot;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">cfg</span><span class="p">.</span><span class="nx">http</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="nx">i</span><span class="o">+</span><span class="mi">2</span><span class="p">]</span> <span class="w"> </span><span class="nx">i</span><span class="o">++</span> <span class="w"> </span><span class="k">continue</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">arg</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;--cluster&quot;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">cluster</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="nx">i</span><span class="o">+</span><span class="mi">2</span><span class="p">]</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">clusterEntry</span><span class="w"> </span><span class="nx">goraft</span><span class="p">.</span><span class="nx">ClusterMember</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">part</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">Split</span><span class="p">(</span><span class="nx">cluster</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;;&quot;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">idAddress</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">Split</span><span class="p">(</span><span class="nx">part</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;,&quot;</span><span class="p">)</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="kt">error</span> <span class="w"> </span><span class="nx">clusterEntry</span><span class="p">.</span><span class="nx">Id</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">strconv</span><span class="p">.</span><span class="nx">ParseUint</span><span class="p">(</span><span class="nx">idAddress</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span><span class="w"> </span><span class="mi">10</span><span class="p">,</span><span class="w"> </span><span class="mi">64</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="s">&quot;Expected $id to be a valid integer in `--cluster $id,$ip`, got: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">idAddress</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">clusterEntry</span><span class="p">.</span><span class="nx">Address</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">idAddress</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="w"> </span><span class="nx">cfg</span><span class="p">.</span><span class="nx">cluster</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">cfg</span><span class="p">.</span><span class="nx">cluster</span><span class="p">,</span><span class="w"> </span><span class="nx">clusterEntry</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">i</span><span class="o">++</span> <span class="w"> </span><span class="k">continue</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">node</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="s">&quot;Missing required parameter: --node $index&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">cfg</span><span class="p">.</span><span class="nx">http</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="s">&quot;Missing required parameter: --http $address&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">cfg</span><span class="p">.</span><span class="nx">cluster</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="s">&quot;Missing required parameter: --cluster $node1Id,$node1Address;...;$nodeNId,$nodeNAddress&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">cfg</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="p">[</span><span class="mi">8</span><span class="p">]</span><span class="kt">byte</span> <span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">crypto</span><span class="p">.</span><span class="nx">Read</span><span class="p">(</span><span class="nx">b</span><span class="p">[:])</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="s">&quot;cannot seed math/rand package with cryptographically secure random number generator&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">rand</span><span class="p">.</span><span class="nx">Seed</span><span class="p">(</span><span class="nb">int64</span><span class="p">(</span><span class="nx">binary</span><span class="p">.</span><span class="nx">LittleEndian</span><span class="p">.</span><span class="nx">Uint64</span><span class="p">(</span><span class="nx">b</span><span class="p">[:])))</span> <span class="w"> </span><span class="nx">cfg</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">getConfig</span><span class="p">()</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">db</span><span class="w"> </span><span class="nx">sync</span><span class="p">.</span><span class="nx">Map</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">sm</span><span class="w"> </span><span class="nx">statemachine</span> <span class="w"> </span><span class="nx">sm</span><span class="p">.</span><span class="nx">db</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">db</span> <span class="w"> </span><span class="nx">sm</span><span class="p">.</span><span class="nx">server</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">cfg</span><span class="p">.</span><span class="nx">index</span> <span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">goraft</span><span class="p">.</span><span class="nx">NewServer</span><span class="p">(</span><span class="nx">cfg</span><span class="p">.</span><span class="nx">cluster</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">sm</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;.&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">cfg</span><span class="p">.</span><span class="nx">index</span><span class="p">)</span> <span class="w"> </span><span class="k">go</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">Start</span><span class="p">()</span> <span class="w"> </span><span class="nx">hs</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">httpServer</span><span class="p">{</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">db</span><span class="p">}</span> <span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">HandleFunc</span><span class="p">(</span><span class="s">&quot;/set&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">hs</span><span class="p">.</span><span class="nx">setHandler</span><span class="p">)</span> <span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">HandleFunc</span><span class="p">(</span><span class="s">&quot;/get&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">hs</span><span class="p">.</span><span class="nx">getHandler</span><span class="p">)</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">ListenAndServe</span><span class="p">(</span><span class="nx">cfg</span><span class="p">.</span><span class="nx">http</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>And that's it for the easy part: a distributed key-value store on top of a Raft cluster.</p> <p>Next we need to implement Raft.</p> <h3 id="a-raft-server">A Raft server</h3><p>If we take a look at Figure 2 in the Raft paper, we get an idea for all the state we need to model.</p> <p><img src="/assets/raft-figure-2.png" alt="Raft Figure 2"></p> <p>We'll dig into the details as we go. But for now let's turn that model into a few Go types. This goes in <code>raft.go</code> in the base directory, not <code>cmd/kvapi</code>.</p> <div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">goraft</span> <span class="kn">import</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="s">&quot;bufio&quot;</span> <span class="w"> </span><span class="s">&quot;context&quot;</span> <span class="w"> </span><span class="s">&quot;encoding/binary&quot;</span> <span class="w"> </span><span class="s">&quot;errors&quot;</span> <span class="w"> </span><span class="s">&quot;fmt&quot;</span> <span class="w"> </span><span class="s">&quot;io&quot;</span> <span class="w"> </span><span class="s">&quot;math/rand&quot;</span> <span class="w"> </span><span class="s">&quot;net&quot;</span> <span class="w"> </span><span class="s">&quot;net/http&quot;</span> <span class="w"> </span><span class="s">&quot;net/rpc&quot;</span> <span class="w"> </span><span class="s">&quot;os&quot;</span> <span class="w"> </span><span class="s">&quot;path&quot;</span> <span class="w"> </span><span class="s">&quot;sync&quot;</span> <span class="w"> </span><span class="s">&quot;time&quot;</span> <span class="p">)</span> <span class="kd">type</span><span class="w"> </span><span class="nx">StateMachine</span><span class="w"> </span><span class="kd">interface</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">Apply</span><span class="p">(</span><span class="nx">cmd</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="p">)</span><span class="w"> </span><span class="p">([]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span> <span class="p">}</span> <span class="kd">type</span><span class="w"> </span><span class="nx">ApplyResult</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">Result</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span> <span class="w"> </span><span class="nx">Error</span><span class="w"> </span><span class="kt">error</span> <span class="p">}</span> <span class="kd">type</span><span class="w"> </span><span class="nx">Entry</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">Command</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span> <span class="w"> </span><span class="nx">Term</span><span class="w"> </span><span class="kt">uint64</span> <span class="w"> </span><span class="c1">// Set by the primary so it can learn about the result of</span> <span class="w"> </span><span class="c1">// applying this command to the state machine</span> <span class="w"> </span><span class="nx">result</span><span class="w"> </span><span class="kd">chan</span><span class="w"> </span><span class="nx">ApplyResult</span> <span class="p">}</span> <span class="kd">type</span><span class="w"> </span><span class="nx">ClusterMember</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">Id</span><span class="w"> </span><span class="kt">uint64</span> <span class="w"> </span><span class="nx">Address</span><span class="w"> </span><span class="kt">string</span> <span class="w"> </span><span class="c1">// Index of the next log entry to send</span> <span class="w"> </span><span class="nx">nextIndex</span><span class="w"> </span><span class="kt">uint64</span> <span class="w"> </span><span class="c1">// Highest log entry known to be replicated</span> <span class="w"> </span><span class="nx">matchIndex</span><span class="w"> </span><span class="kt">uint64</span> <span class="w"> </span><span class="c1">// Who was voted for in the most recent term</span> <span class="w"> </span><span class="nx">votedFor</span><span class="w"> </span><span class="kt">uint64</span> <span class="w"> </span><span class="c1">// TCP connection</span> <span class="w"> </span><span class="nx">rpcClient</span><span class="w"> </span><span class="o">*</span><span class="nx">rpc</span><span class="p">.</span><span class="nx">Client</span> <span class="p">}</span> <span class="kd">type</span><span class="w"> </span><span class="nx">ServerState</span><span class="w"> </span><span class="kt">string</span> <span class="kd">const</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="nx">leaderState</span><span class="w"> </span><span class="nx">ServerState</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">&quot;leader&quot;</span> <span class="w"> </span><span class="nx">followerState</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">&quot;follower&quot;</span> <span class="w"> </span><span class="nx">candidateState</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">&quot;candidate&quot;</span> <span class="p">)</span> <span class="kd">type</span><span class="w"> </span><span class="nx">Server</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// These variables for shutting down.</span> <span class="w"> </span><span class="nx">done</span><span class="w"> </span><span class="kt">bool</span> <span class="w"> </span><span class="nx">server</span><span class="w"> </span><span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Server</span> <span class="w"> </span><span class="nx">Debug</span><span class="w"> </span><span class="kt">bool</span> <span class="w"> </span><span class="nx">mu</span><span class="w"> </span><span class="nx">sync</span><span class="p">.</span><span class="nx">Mutex</span> <span class="w"> </span><span class="c1">// ----------- PERSISTENT STATE -----------</span> <span class="w"> </span><span class="c1">// The current term</span> <span class="w"> </span><span class="nx">currentTerm</span><span class="w"> </span><span class="kt">uint64</span> <span class="w"> </span><span class="nx">log</span><span class="w"> </span><span class="p">[]</span><span class="nx">Entry</span> <span class="w"> </span><span class="c1">// votedFor is stored in `cluster []ClusterMember` below,</span> <span class="w"> </span><span class="c1">// mapped by `clusterIndex` below</span> <span class="w"> </span><span class="c1">// ----------- READONLY STATE -----------</span> <span class="w"> </span><span class="c1">// Unique identifier for this Server</span> <span class="w"> </span><span class="nx">id</span><span class="w"> </span><span class="kt">uint64</span> <span class="w"> </span><span class="c1">// The TCP address for RPC</span> <span class="w"> </span><span class="nx">address</span><span class="w"> </span><span class="kt">string</span> <span class="w"> </span><span class="c1">// When to start elections after no append entry messages</span> <span class="w"> </span><span class="nx">electionTimeout</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Time</span> <span class="w"> </span><span class="c1">// How often to send empty messages</span> <span class="w"> </span><span class="nx">heartbeatMs</span><span class="w"> </span><span class="kt">int</span> <span class="w"> </span><span class="c1">// When to next send empty message</span> <span class="w"> </span><span class="nx">heartbeatTimeout</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Time</span> <span class="w"> </span><span class="c1">// User-provided state machine</span> <span class="w"> </span><span class="nx">statemachine</span><span class="w"> </span><span class="nx">StateMachine</span> <span class="w"> </span><span class="c1">// Metadata directory</span> <span class="w"> </span><span class="nx">metadataDir</span><span class="w"> </span><span class="kt">string</span> <span class="w"> </span><span class="c1">// Metadata store</span> <span class="w"> </span><span class="nx">fd</span><span class="w"> </span><span class="o">*</span><span class="nx">os</span><span class="p">.</span><span class="nx">File</span> <span class="w"> </span><span class="c1">// ----------- VOLATILE STATE -----------</span> <span class="w"> </span><span class="c1">// Index of highest log entry known to be committed</span> <span class="w"> </span><span class="nx">commitIndex</span><span class="w"> </span><span class="kt">uint64</span> <span class="w"> </span><span class="c1">// Index of highest log entry applied to state machine</span> <span class="w"> </span><span class="nx">lastApplied</span><span class="w"> </span><span class="kt">uint64</span> <span class="w"> </span><span class="c1">// Candidate, follower, or leader</span> <span class="w"> </span><span class="nx">state</span><span class="w"> </span><span class="nx">ServerState</span> <span class="w"> </span><span class="c1">// Servers in the cluster, including this one</span> <span class="w"> </span><span class="nx">cluster</span><span class="w"> </span><span class="p">[]</span><span class="nx">ClusterMember</span> <span class="w"> </span><span class="c1">// Index of this server</span> <span class="w"> </span><span class="nx">clusterIndex</span><span class="w"> </span><span class="kt">int</span> <span class="p">}</span> </pre></div> <p>And let's build a constructor to initialize the state for all servers in the cluster, as well as local server state.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">NewServer</span><span class="p">(</span> <span class="w"> </span><span class="nx">clusterConfig</span><span class="w"> </span><span class="p">[]</span><span class="nx">ClusterMember</span><span class="p">,</span> <span class="w"> </span><span class="nx">statemachine</span><span class="w"> </span><span class="nx">StateMachine</span><span class="p">,</span> <span class="w"> </span><span class="nx">metadataDir</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span> <span class="w"> </span><span class="nx">clusterIndex</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span> <span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Explicitly make a copy of the cluster because we&#39;ll be</span> <span class="w"> </span><span class="c1">// modifying it in this server.</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">cluster</span><span class="w"> </span><span class="p">[]</span><span class="nx">ClusterMember</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">clusterConfig</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">Id</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="s">&quot;Id must not be 0.&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">cluster</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">cluster</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">Server</span><span class="p">{</span> <span class="w"> </span><span class="nx">id</span><span class="p">:</span><span class="w"> </span><span class="nx">cluster</span><span class="p">[</span><span class="nx">clusterIndex</span><span class="p">].</span><span class="nx">Id</span><span class="p">,</span> <span class="w"> </span><span class="nx">address</span><span class="p">:</span><span class="w"> </span><span class="nx">cluster</span><span class="p">[</span><span class="nx">clusterIndex</span><span class="p">].</span><span class="nx">Address</span><span class="p">,</span> <span class="w"> </span><span class="nx">cluster</span><span class="p">:</span><span class="w"> </span><span class="nx">cluster</span><span class="p">,</span> <span class="w"> </span><span class="nx">statemachine</span><span class="p">:</span><span class="w"> </span><span class="nx">statemachine</span><span class="p">,</span> <span class="w"> </span><span class="nx">metadataDir</span><span class="p">:</span><span class="w"> </span><span class="nx">metadataDir</span><span class="p">,</span> <span class="w"> </span><span class="nx">clusterIndex</span><span class="p">:</span><span class="w"> </span><span class="nx">clusterIndex</span><span class="p">,</span> <span class="w"> </span><span class="nx">heartbeatMs</span><span class="p">:</span><span class="w"> </span><span class="mi">300</span><span class="p">,</span> <span class="w"> </span><span class="nx">mu</span><span class="p">:</span><span class="w"> </span><span class="nx">sync</span><span class="p">.</span><span class="nx">Mutex</span><span class="p">{},</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>And add a few debugging and assertion helpers.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">debugmsg</span><span class="p">(</span><span class="nx">msg</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">&quot;%s [Id: %d, Term: %d] %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Now</span><span class="p">().</span><span class="nx">Format</span><span class="p">(</span><span class="nx">time</span><span class="p">.</span><span class="nx">RFC3339Nano</span><span class="p">),</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">id</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">currentTerm</span><span class="p">,</span><span class="w"> </span><span class="nx">msg</span><span class="p">)</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">debug</span><span class="p">(</span><span class="nx">msg</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">s</span><span class="p">.</span><span class="nx">Debug</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">debugmsg</span><span class="p">(</span><span class="nx">msg</span><span class="p">))</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">debugf</span><span class="p">(</span><span class="nx">msg</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="w"> </span><span class="o">...</span><span class="kt">any</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">s</span><span class="p">.</span><span class="nx">Debug</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debug</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="nx">msg</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="o">...</span><span class="p">))</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">warn</span><span class="p">(</span><span class="nx">msg</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="s">&quot;[WARN] &quot;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debugmsg</span><span class="p">(</span><span class="nx">msg</span><span class="p">))</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">warnf</span><span class="p">(</span><span class="nx">msg</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="w"> </span><span class="o">...</span><span class="kt">any</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="nx">msg</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="o">...</span><span class="p">))</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">Assert</span><span class="p">[</span><span class="nx">T</span><span class="w"> </span><span class="kt">comparable</span><span class="p">](</span><span class="nx">msg</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">a</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="nx">T</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">a</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">&quot;%s. Got a = %#v, b = %#v&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">msg</span><span class="p">,</span><span class="w"> </span><span class="nx">a</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="p">))</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">Server_assert</span><span class="p">[</span><span class="nx">T</span><span class="w"> </span><span class="kt">comparable</span><span class="p">](</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">,</span><span class="w"> </span><span class="nx">msg</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">a</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="nx">T</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">Assert</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">debugmsg</span><span class="p">(</span><span class="nx">msg</span><span class="p">),</span><span class="w"> </span><span class="nx">a</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="p">)</span> <span class="p">}</span> </pre></div> <h3 id="persistent-state">Persistent state</h3><p>As Figure 2 says, <code>currentTerm</code>, <code>log</code>, and <code>votedFor</code> must be persisted to disk as they're edited.</p> <p>I like to initially doing the stupidest thing possible. So in the first version of this project I used <code>encoding/gob</code> to write these three fields to disk every time <code>s.persist()</code> was called.</p> <p>Here is what this first version looked like:</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">persist</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Lock</span><span class="p">()</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Unlock</span><span class="p">()</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">fd</span><span class="p">.</span><span class="nx">Truncate</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">fd</span><span class="p">.</span><span class="nx">Seek</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span> <span class="w"> </span><span class="nx">enc</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">gob</span><span class="p">.</span><span class="nx">NewEncoder</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">fd</span><span class="p">)</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">enc</span><span class="p">.</span><span class="nx">Encode</span><span class="p">(</span><span class="nx">PersistentState</span><span class="p">{</span> <span class="w"> </span><span class="nx">CurrentTerm</span><span class="p">:</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">currentTerm</span><span class="p">,</span> <span class="w"> </span><span class="nx">Log</span><span class="p">:</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">,</span> <span class="w"> </span><span class="nx">VotedFor</span><span class="p">:</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">votedFor</span><span class="p">,</span> <span class="w"> </span><span class="p">})</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">fd</span><span class="p">.</span><span class="nx">Sync</span><span class="p">();</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debug</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">&quot;Persisted. Term: %d. Log Len: %d. Voted For: %s.&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">currentTerm</span><span class="p">,</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">),</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">votedFor</span><span class="p">))</span> <span class="p">}</span> </pre></div> <p>But doing so means this implementation is a function of the size of the log. And that was horrible for throughput.</p> <p>I also noticed that <code>encoding/gob</code> is pretty inefficient.</p> <p>For a simple struct like:</p> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">X</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">A</span><span class="w"> </span><span class="kt">uint64</span> <span class="w"> </span><span class="nx">B</span><span class="w"> </span><span class="p">[]</span><span class="kt">uint64</span> <span class="w"> </span><span class="nx">C</span><span class="w"> </span><span class="kt">bool</span> <span class="p">}</span> </pre></div> <p><code>encoding/gob</code> uses <a href="https://play.golang.com/p/TUe9TDgaZOw">68 bytes to store that data for when B has two entries</a>. If we wrote the encoder/decoder ourselves we could store that struct in 33 bytes (<code>8 (sizeof(A)) + 8 (sizeof(len(B))) + 16 (len(B) * sizeof(B)) + 1 (sizeof(C))</code>).</p> <p>It's not that <code>encoding/gob</code> is bad. It just likely has different constraints than we are party to.</p> <p>So I decided to swap out <code>encoding/gob</code> for simply binary encoding the fields and also, importantly, keeping track of exactly how many entries in the log must be written and only writing that many.</p> <h4 id="<code>s.persist()</code>"><code>s.persist()</code></h4><p>Here's what that looks like.</p> <div class="highlight"><pre><span></span><span class="kd">const</span><span class="w"> </span><span class="nx">PAGE_SIZE</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">4096</span> <span class="kd">const</span><span class="w"> </span><span class="nx">ENTRY_HEADER</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">16</span> <span class="kd">const</span><span class="w"> </span><span class="nx">ENTRY_SIZE</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">128</span> <span class="c1">// Must be called within s.mu.Lock()</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">persist</span><span class="p">(</span><span class="nx">writeLog</span><span class="w"> </span><span class="kt">bool</span><span class="p">,</span><span class="w"> </span><span class="nx">nNewEntries</span><span class="w"> </span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">t</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Now</span><span class="p">()</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">nNewEntries</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">writeLog</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">nNewEntries</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">fd</span><span class="p">.</span><span class="nx">Seek</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">page</span><span class="w"> </span><span class="p">[</span><span class="nx">PAGE_SIZE</span><span class="p">]</span><span class="kt">byte</span> <span class="w"> </span><span class="c1">// Bytes 0 - 8: Current term</span> <span class="w"> </span><span class="c1">// Bytes 8 - 16: Voted for</span> <span class="w"> </span><span class="c1">// Bytes 16 - 24: Log length</span> <span class="w"> </span><span class="c1">// Bytes 4096 - N: Log</span> <span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">LittleEndian</span><span class="p">.</span><span class="nx">PutUint64</span><span class="p">(</span><span class="nx">page</span><span class="p">[:</span><span class="mi">8</span><span class="p">],</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">currentTerm</span><span class="p">)</span> <span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">LittleEndian</span><span class="p">.</span><span class="nx">PutUint64</span><span class="p">(</span><span class="nx">page</span><span class="p">[</span><span class="mi">8</span><span class="p">:</span><span class="mi">16</span><span class="p">],</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">getVotedFor</span><span class="p">())</span> <span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">LittleEndian</span><span class="p">.</span><span class="nx">PutUint64</span><span class="p">(</span><span class="nx">page</span><span class="p">[</span><span class="mi">16</span><span class="p">:</span><span class="mi">24</span><span class="p">],</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">)))</span> <span class="w"> </span><span class="nx">n</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">fd</span><span class="p">.</span><span class="nx">Write</span><span class="p">(</span><span class="nx">page</span><span class="p">[:])</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">Server_assert</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Wrote full page&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">n</span><span class="p">,</span><span class="w"> </span><span class="nx">PAGE_SIZE</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">writeLog</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">nNewEntries</span><span class="w"> </span><span class="p">&gt;</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">newLogOffset</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">max</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">)</span><span class="o">-</span><span class="nx">nNewEntries</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">fd</span><span class="p">.</span><span class="nx">Seek</span><span class="p">(</span><span class="nb">int64</span><span class="p">(</span><span class="nx">PAGE_SIZE</span><span class="o">+</span><span class="nx">ENTRY_SIZE</span><span class="o">*</span><span class="nx">newLogOffset</span><span class="p">),</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span> <span class="w"> </span><span class="nx">bw</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">bufio</span><span class="p">.</span><span class="nx">NewWriter</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">fd</span><span class="p">)</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">entryBytes</span><span class="w"> </span><span class="p">[</span><span class="nx">ENTRY_SIZE</span><span class="p">]</span><span class="kt">byte</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">newLogOffset</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">);</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Bytes 0 - 8: Entry term</span> <span class="w"> </span><span class="c1">// Bytes 8 - 16: Entry command length</span> <span class="w"> </span><span class="c1">// Bytes 16 - ENTRY_SIZE: Entry command</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">Command</span><span class="p">)</span><span class="w"> </span><span class="p">&gt;</span><span class="w"> </span><span class="nx">ENTRY_SIZE</span><span class="o">-</span><span class="nx">ENTRY_HEADER</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">&quot;Command is too large (%d). Must be at most %d bytes.&quot;</span><span class="p">,</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">Command</span><span class="p">),</span><span class="w"> </span><span class="nx">ENTRY_SIZE</span><span class="o">-</span><span class="nx">ENTRY_HEADER</span><span class="p">))</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">LittleEndian</span><span class="p">.</span><span class="nx">PutUint64</span><span class="p">(</span><span class="nx">entryBytes</span><span class="p">[:</span><span class="mi">8</span><span class="p">],</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">Term</span><span class="p">)</span> <span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">LittleEndian</span><span class="p">.</span><span class="nx">PutUint64</span><span class="p">(</span><span class="nx">entryBytes</span><span class="p">[</span><span class="mi">8</span><span class="p">:</span><span class="mi">16</span><span class="p">],</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">Command</span><span class="p">)))</span> <span class="w"> </span><span class="nb">copy</span><span class="p">(</span><span class="nx">entryBytes</span><span class="p">[</span><span class="mi">16</span><span class="p">:],</span><span class="w"> </span><span class="p">[]</span><span class="nb">byte</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">Command</span><span class="p">))</span> <span class="w"> </span><span class="nx">n</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">bw</span><span class="p">.</span><span class="nx">Write</span><span class="p">(</span><span class="nx">entryBytes</span><span class="p">[:])</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">Server_assert</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Wrote full page&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">n</span><span class="p">,</span><span class="w"> </span><span class="nx">ENTRY_SIZE</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">bw</span><span class="p">.</span><span class="nx">Flush</span><span class="p">()</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">fd</span><span class="p">.</span><span class="nx">Sync</span><span class="p">();</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debugf</span><span class="p">(</span><span class="s">&quot;Persisted in %s. Term: %d. Log Len: %d (%d new). Voted For: %d.&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Now</span><span class="p">().</span><span class="nx">Sub</span><span class="p">(</span><span class="nx">t</span><span class="p">),</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">currentTerm</span><span class="p">,</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">),</span><span class="w"> </span><span class="nx">nNewEntries</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">getVotedFor</span><span class="p">())</span> <span class="p">}</span> </pre></div> <p>Again the important thing is that only the entries that <em>need</em> to be written are written. We do that by <code>seek</code>-ing to the offset of the first entry that needs to be written.</p> <p>And we collect writes of entries in a <code>bufio.Writer</code> so we don't waste write syscalls. Don't forget to flush the buffered writer!</p> <p>And don't forget to flush all writes to disk with <code>fd.Sync()</code>.</p> <p class="note"> <code>ENTRY_SIZE</code> is something that I could see being configurable based on the workload. Some workloads truly need only 128 bytes. But a key-value store probably wants much more than that. This implementation doesn't try to handle the case of completely arbitrary sized keys and values. </p><p>Lastly, a few helpers used in there:</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">min</span><span class="p">[</span><span class="nx">T</span><span class="w"> </span><span class="o">~</span><span class="kt">int</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="o">~</span><span class="kt">uint64</span><span class="p">](</span><span class="nx">a</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="nx">T</span><span class="p">)</span><span class="w"> </span><span class="nx">T</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">a</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">a</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">b</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">max</span><span class="p">[</span><span class="nx">T</span><span class="w"> </span><span class="o">~</span><span class="kt">int</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="o">~</span><span class="kt">uint64</span><span class="p">](</span><span class="nx">a</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="nx">T</span><span class="p">)</span><span class="w"> </span><span class="nx">T</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">a</span><span class="w"> </span><span class="p">&gt;</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">a</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">b</span> <span class="p">}</span> <span class="c1">// Must be called within s.mu.Lock()</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">getVotedFor</span><span class="p">()</span><span class="w"> </span><span class="kt">uint64</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">clusterIndex</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">votedFor</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">Server_assert</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Invalid cluster&quot;</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span> <span class="p">}</span> </pre></div> <h4 id="<code>s.restore()</code>"><code>s.restore()</code></h4><p>Now let's do the reverse operation, restoring from disk. This will only be called once on startup.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">restore</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Lock</span><span class="p">()</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Unlock</span><span class="p">()</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">fd</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="kt">error</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">fd</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">OpenFile</span><span class="p">(</span> <span class="w"> </span><span class="nx">path</span><span class="p">.</span><span class="nx">Join</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">metadataDir</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">&quot;md_%d.dat&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">id</span><span class="p">)),</span> <span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">O_SYNC</span><span class="o">|</span><span class="nx">os</span><span class="p">.</span><span class="nx">O_CREATE</span><span class="o">|</span><span class="nx">os</span><span class="p">.</span><span class="nx">O_RDWR</span><span class="p">,</span> <span class="w"> </span><span class="mo">0755</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">fd</span><span class="p">.</span><span class="nx">Seek</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span> <span class="w"> </span><span class="c1">// Bytes 0 - 8: Current term</span> <span class="w"> </span><span class="c1">// Bytes 8 - 16: Voted for</span> <span class="w"> </span><span class="c1">// Bytes 16 - 24: Log length</span> <span class="w"> </span><span class="c1">// Bytes 4096 - N: Log</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">page</span><span class="w"> </span><span class="p">[</span><span class="nx">PAGE_SIZE</span><span class="p">]</span><span class="kt">byte</span> <span class="w"> </span><span class="nx">n</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">fd</span><span class="p">.</span><span class="nx">Read</span><span class="p">(</span><span class="nx">page</span><span class="p">[:])</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">io</span><span class="p">.</span><span class="nx">EOF</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">ensureLog</span><span class="p">()</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">Server_assert</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Read full page&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">n</span><span class="p">,</span><span class="w"> </span><span class="nx">PAGE_SIZE</span><span class="p">)</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">currentTerm</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">LittleEndian</span><span class="p">.</span><span class="nx">Uint64</span><span class="p">(</span><span class="nx">page</span><span class="p">[:</span><span class="mi">8</span><span class="p">])</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">setVotedFor</span><span class="p">(</span><span class="nx">binary</span><span class="p">.</span><span class="nx">LittleEndian</span><span class="p">.</span><span class="nx">Uint64</span><span class="p">(</span><span class="nx">page</span><span class="p">[</span><span class="mi">8</span><span class="p">:</span><span class="mi">16</span><span class="p">]))</span> <span class="w"> </span><span class="nx">lenLog</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">LittleEndian</span><span class="p">.</span><span class="nx">Uint64</span><span class="p">(</span><span class="nx">page</span><span class="p">[</span><span class="mi">16</span><span class="p">:</span><span class="mi">24</span><span class="p">])</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">lenLog</span><span class="w"> </span><span class="p">&gt;</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">fd</span><span class="p">.</span><span class="nx">Seek</span><span class="p">(</span><span class="nb">int64</span><span class="p">(</span><span class="nx">PAGE_SIZE</span><span class="p">),</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">e</span><span class="w"> </span><span class="nx">Entry</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nx">lenLog</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">entryBytes</span><span class="w"> </span><span class="p">[</span><span class="nx">ENTRY_SIZE</span><span class="p">]</span><span class="kt">byte</span> <span class="w"> </span><span class="nx">n</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">fd</span><span class="p">.</span><span class="nx">Read</span><span class="p">(</span><span class="nx">entryBytes</span><span class="p">[:])</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">Server_assert</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Read full entry&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">n</span><span class="p">,</span><span class="w"> </span><span class="nx">ENTRY_SIZE</span><span class="p">)</span> <span class="w"> </span><span class="c1">// Bytes 0 - 8: Entry term</span> <span class="w"> </span><span class="c1">// Bytes 8 - 16: Entry command length</span> <span class="w"> </span><span class="c1">// Bytes 16 - ENTRY_SIZE: Entry command</span> <span class="w"> </span><span class="nx">e</span><span class="p">.</span><span class="nx">Term</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">LittleEndian</span><span class="p">.</span><span class="nx">Uint64</span><span class="p">(</span><span class="nx">entryBytes</span><span class="p">[:</span><span class="mi">8</span><span class="p">])</span> <span class="w"> </span><span class="nx">lenValue</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">LittleEndian</span><span class="p">.</span><span class="nx">Uint64</span><span class="p">(</span><span class="nx">entryBytes</span><span class="p">[</span><span class="mi">8</span><span class="p">:</span><span class="mi">16</span><span class="p">])</span> <span class="w"> </span><span class="nx">e</span><span class="p">.</span><span class="nx">Command</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">entryBytes</span><span class="p">[</span><span class="mi">16</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="mi">16</span><span class="o">+</span><span class="nx">lenValue</span><span class="p">]</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">,</span><span class="w"> </span><span class="nx">e</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">ensureLog</span><span class="p">()</span> <span class="p">}</span> </pre></div> <p>And a few helpers it calls:</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">ensureLog</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Always has at least one log entry.</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">,</span><span class="w"> </span><span class="nx">Entry</span><span class="p">{})</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> <span class="c1">// Must be called within s.mu.Lock()</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">setVotedFor</span><span class="p">(</span><span class="nx">id</span><span class="w"> </span><span class="kt">uint64</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">clusterIndex</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">votedFor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">id</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">Server_assert</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Invalid cluster&quot;</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span><span class="p">)</span> <span class="p">}</span> </pre></div> <h3 id="the-main-loop">The main loop</h3><p>Now let's think about the main loop. Before starting the loop we need to 1) restore persistent state from disk and 2) kick off an RPC server so servers in the cluster can send and receive messages to and from eachother.</p> <div class="highlight"><pre><span></span><span class="c1">// Make sure rand is seeded</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">Start</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Lock</span><span class="p">()</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">state</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">followerState</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">done</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Unlock</span><span class="p">()</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">restore</span><span class="p">()</span> <span class="w"> </span><span class="nx">rpcServer</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">rpc</span><span class="p">.</span><span class="nx">NewServer</span><span class="p">()</span> <span class="w"> </span><span class="nx">rpcServer</span><span class="p">.</span><span class="nx">Register</span><span class="p">(</span><span class="nx">s</span><span class="p">)</span> <span class="w"> </span><span class="nx">l</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">net</span><span class="p">.</span><span class="nx">Listen</span><span class="p">(</span><span class="s">&quot;tcp&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">address</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">mux</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">NewServeMux</span><span class="p">()</span> <span class="w"> </span><span class="nx">mux</span><span class="p">.</span><span class="nx">Handle</span><span class="p">(</span><span class="nx">rpc</span><span class="p">.</span><span class="nx">DefaultRPCPath</span><span class="p">,</span><span class="w"> </span><span class="nx">rpcServer</span><span class="p">)</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">server</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">http</span><span class="p">.</span><span class="nx">Server</span><span class="p">{</span><span class="nx">Handler</span><span class="p">:</span><span class="w"> </span><span class="nx">mux</span><span class="p">}</span> <span class="w"> </span><span class="k">go</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">server</span><span class="p">.</span><span class="nx">Serve</span><span class="p">(</span><span class="nx">l</span><span class="p">)</span> <span class="w"> </span><span class="k">go</span><span class="w"> </span><span class="kd">func</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Lock</span><span class="p">()</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">resetElectionTimeout</span><span class="p">()</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Unlock</span><span class="p">()</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Lock</span><span class="p">()</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">done</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Unlock</span><span class="p">()</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">state</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">state</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Unlock</span><span class="p">()</span> </pre></div> <p>In the main loop we are either in the leader state, follower state or candidate state.</p> <p>All states will potentially receive RPC messages from other servers in the cluster but that won't be modeled in this main loop.</p> <p>The only thing going on in the main loop is that:</p> <ul> <li>We send heartbeat RPCs (leader state)</li> <li>We try to advance the commit index (leader state only) and apply commands to the state machine (leader and follower states)</li> <li>We trigger a new election if we haven't received a message in some time (candidate and follower states)</li> <li>Or we become the leader (candidate state)</li> </ul> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">state</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">leaderState</span><span class="p">:</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">heartbeat</span><span class="p">()</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">advanceCommitIndex</span><span class="p">()</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">followerState</span><span class="p">:</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">timeout</span><span class="p">()</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">advanceCommitIndex</span><span class="p">()</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">candidateState</span><span class="p">:</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">timeout</span><span class="p">()</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">becomeLeader</span><span class="p">()</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}()</span> <span class="p">}</span> </pre></div> <p>Let's deal with leader election first.</p> <h3 id="leader-election">Leader election</h3><p>Leader election happens every time nodes haven't received a message from a valid leader in some time.</p> <p>I'll break this up into four major pieces:</p> <ol> <li>Timing out and becoming a candidate after a random (but bounded) period of time of not hearing a message from a valid leader: <code>s.timeout()</code>.</li> <li>The candidate requests votes from all other servers: <code>s.requestVote()</code>.</li> <li>All servers handle vote requests: <code>s.HandleRequestVoteRequest()</code>.</li> <li>A candidate with a quorum of vote requests becomes the leader: <code>s.becomeLeader()</code>.</li> </ol> <p>You increment <code>currentTerm</code>, vote for yourself and send RPC vote requests to other nodes in the server.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">resetElectionTimeout</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">interval</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Duration</span><span class="p">(</span><span class="nx">rand</span><span class="p">.</span><span class="nx">Intn</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">heartbeatMs</span><span class="o">*</span><span class="mi">2</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">heartbeatMs</span><span class="o">*</span><span class="mi">2</span><span class="p">)</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debugf</span><span class="p">(</span><span class="s">&quot;New interval: %s.&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">interval</span><span class="o">*</span><span class="nx">time</span><span class="p">.</span><span class="nx">Millisecond</span><span class="p">)</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">electionTimeout</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Now</span><span class="p">().</span><span class="nx">Add</span><span class="p">(</span><span class="nx">interval</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Millisecond</span><span class="p">)</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">timeout</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Lock</span><span class="p">()</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Unlock</span><span class="p">()</span> <span class="w"> </span><span class="nx">hasTimedOut</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Now</span><span class="p">().</span><span class="nx">After</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">electionTimeout</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">hasTimedOut</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debug</span><span class="p">(</span><span class="s">&quot;Timed out, starting new election.&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">state</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">candidateState</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">currentTerm</span><span class="o">++</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">clusterIndex</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">votedFor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">id</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">votedFor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">0</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">resetElectionTimeout</span><span class="p">()</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">persist</span><span class="p">(</span><span class="kc">false</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">requestVote</span><span class="p">()</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>Everything in there is implemented already except for <code>s.requestVote()</code>. Let's dig into that.</p> <h4 id="<code>s.requestvote()</code>"><code>s.requestVote()</code></h4><p>By referring back to Figure 2 from the Raft paper we can see how to model the request vote request and response. Let's turn that into some Go types.</p> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">RPCMessage</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">Term</span><span class="w"> </span><span class="kt">uint64</span> <span class="p">}</span> <span class="kd">type</span><span class="w"> </span><span class="nx">RequestVoteRequest</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">RPCMessage</span> <span class="w"> </span><span class="c1">// Candidate requesting vote</span> <span class="w"> </span><span class="nx">CandidateId</span><span class="w"> </span><span class="kt">uint64</span> <span class="w"> </span><span class="c1">// Index of candidate&#39;s last log entry</span> <span class="w"> </span><span class="nx">LastLogIndex</span><span class="w"> </span><span class="kt">uint64</span> <span class="w"> </span><span class="c1">// Term of candidate&#39;s last log entry</span> <span class="w"> </span><span class="nx">LastLogTerm</span><span class="w"> </span><span class="kt">uint64</span> <span class="p">}</span> <span class="kd">type</span><span class="w"> </span><span class="nx">RequestVoteResponse</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">RPCMessage</span> <span class="w"> </span><span class="c1">// True means candidate received vote</span> <span class="w"> </span><span class="nx">VoteGranted</span><span class="w"> </span><span class="kt">bool</span> <span class="p">}</span> </pre></div> <p>Now we just need to fill the <code>RequestVoteRequest</code> struct out and send it to each other node in the cluster in parallel. As we iterate through nodes in the cluster, we skip ourselves (we always immediately vote for ourselves).</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">requestVote</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">clusterIndex</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">continue</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">go</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="nx">i</span><span class="w"> </span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Lock</span><span class="p">()</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debugf</span><span class="p">(</span><span class="s">&quot;Requesting vote from %d.&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">Id</span><span class="p">)</span> <span class="w"> </span><span class="nx">lastLogIndex</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span> <span class="w"> </span><span class="nx">lastLogTerm</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">[</span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">].</span><span class="nx">Term</span> <span class="w"> </span><span class="nx">req</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">RequestVoteRequest</span><span class="p">{</span> <span class="w"> </span><span class="nx">RPCMessage</span><span class="p">:</span><span class="w"> </span><span class="nx">RPCMessage</span><span class="p">{</span> <span class="w"> </span><span class="nx">Term</span><span class="p">:</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">currentTerm</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="nx">CandidateId</span><span class="p">:</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">id</span><span class="p">,</span> <span class="w"> </span><span class="nx">LastLogIndex</span><span class="p">:</span><span class="w"> </span><span class="nx">lastLogIndex</span><span class="p">,</span> <span class="w"> </span><span class="nx">LastLogTerm</span><span class="p">:</span><span class="w"> </span><span class="nx">lastLogTerm</span><span class="p">,</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Unlock</span><span class="p">()</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">rsp</span><span class="w"> </span><span class="nx">RequestVoteResponse</span> <span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">rpcCall</span><span class="p">(</span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Server.HandleRequestVoteRequest&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">req</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">rsp</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Will retry later</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Now remember from Figure 2 in the Raft paper that we must always check that the RPC request and response is still valid. If the term of the response is greater than our own term, we must immediately stop processing and revert to follower state.</p> <p>Otherwise only if the response is still relevant to us at the moment (the response term is the same as the request term) <em>and</em> the request has succeeded do we count the vote.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Lock</span><span class="p">()</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Unlock</span><span class="p">()</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">updateTerm</span><span class="p">(</span><span class="nx">rsp</span><span class="p">.</span><span class="nx">RPCMessage</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">dropStaleResponse</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">rsp</span><span class="p">.</span><span class="nx">Term</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">req</span><span class="p">.</span><span class="nx">Term</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">dropStaleResponse</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">rsp</span><span class="p">.</span><span class="nx">VoteGranted</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debugf</span><span class="p">(</span><span class="s">&quot;Vote granted by %d.&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">Id</span><span class="p">)</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">votedFor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">id</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}(</span><span class="nx">i</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>And that's it for the candidate side of requesting a vote.</p> <p>The implementation of <code>s.updateTerm()</code> is simple. It just takes care of transitioning to follower state if the term of an RPC message is greater than the node's current term.</p> <div class="highlight"><pre><span></span><span class="c1">// Must be called within a s.mu.Lock()</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">updateTerm</span><span class="p">(</span><span class="nx">msg</span><span class="w"> </span><span class="nx">RPCMessage</span><span class="p">)</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">transitioned</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">msg</span><span class="p">.</span><span class="nx">Term</span><span class="w"> </span><span class="p">&gt;</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">currentTerm</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">currentTerm</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">msg</span><span class="p">.</span><span class="nx">Term</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">state</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">followerState</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">setVotedFor</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="w"> </span><span class="nx">transitioned</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">true</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debug</span><span class="p">(</span><span class="s">&quot;Transitioned to follower&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">resetElectionTimeout</span><span class="p">()</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">persist</span><span class="p">(</span><span class="kc">false</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">transitioned</span> <span class="p">}</span> </pre></div> <p>And the implementation of <code>s.rpcCall()</code> is a wrapper around <code>net/rpc</code> to lazily connect.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">rpcCall</span><span class="p">(</span><span class="nx">i</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="nx">name</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">req</span><span class="p">,</span><span class="w"> </span><span class="nx">rsp</span><span class="w"> </span><span class="kt">any</span><span class="p">)</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Lock</span><span class="p">()</span> <span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">]</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="kt">error</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">rpcClient</span><span class="w"> </span><span class="o">*</span><span class="nx">rpc</span><span class="p">.</span><span class="nx">Client</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">rpcClient</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">rpcClient</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">rpcClient</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">rpc</span><span class="p">.</span><span class="nx">DialHTTP</span><span class="p">(</span><span class="s">&quot;tcp&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">Address</span><span class="p">)</span> <span class="w"> </span><span class="nx">rpcClient</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">rpcClient</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Unlock</span><span class="p">()</span> <span class="w"> </span><span class="c1">// TODO: where/how to reconnect if the connection must be reestablished?</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">rpcClient</span><span class="p">.</span><span class="nx">Call</span><span class="p">(</span><span class="nx">name</span><span class="p">,</span><span class="w"> </span><span class="nx">req</span><span class="p">,</span><span class="w"> </span><span class="nx">rsp</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">warnf</span><span class="p">(</span><span class="s">&quot;Error calling %s on %d: %s.&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">name</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">Id</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <p>Let's dig into the other side of request vote, what happens when a node receives a vote request?</p> <h4 id="<code>s.handlevoterequest()</code>"><code>s.HandleVoteRequest()</code></h4><p>First off, as discussed above, we must always check the RPC term versus our own and revert to follower if the term is greater than our own. (Remember that since this is an RPC request it could come to a server in any state: leader, candidate, or follower.)</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">HandleRequestVoteRequest</span><span class="p">(</span><span class="nx">req</span><span class="w"> </span><span class="nx">RequestVoteRequest</span><span class="p">,</span><span class="w"> </span><span class="nx">rsp</span><span class="w"> </span><span class="o">*</span><span class="nx">RequestVoteResponse</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Lock</span><span class="p">()</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Unlock</span><span class="p">()</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">updateTerm</span><span class="p">(</span><span class="nx">req</span><span class="p">.</span><span class="nx">RPCMessage</span><span class="p">)</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debugf</span><span class="p">(</span><span class="s">&quot;Received vote request from %d.&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">req</span><span class="p">.</span><span class="nx">CandidateId</span><span class="p">)</span> </pre></div> <p>Then we can return immediately if the request term is lower than our own (that means it's an old request).</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">rsp</span><span class="p">.</span><span class="nx">VoteGranted</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="nx">rsp</span><span class="p">.</span><span class="nx">Term</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">currentTerm</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">req</span><span class="p">.</span><span class="nx">Term</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">currentTerm</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debugf</span><span class="p">(</span><span class="s">&quot;Not granting vote request from %d.&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">req</span><span class="p">.</span><span class="nx">CandidateId</span><span class="p">)</span> <span class="w"> </span><span class="nx">Server_assert</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;VoteGranted = false&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">rsp</span><span class="p">.</span><span class="nx">VoteGranted</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>And finally, we check to make sure the requester's log is at least as up-to-date as our own and that we haven't already voted for ourselves.</p> <p>The first condition (up-to-date log) was not described in the Raft paper that I could find. But the author of the paper published a Raft TLA+ spec that does <a href="https://github.com/ongardie/raft.tla/blob/master/raft.tla#L284">have it defined</a>.</p> <p>And the second condition you might think could never happen since we already wrote the code that said when we trigger an election we vote for ourselves. But since each server has a random election timeout, the one who starts the election will differ in timing sufficiently enough to catch other servers and allow them to vote for it.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">lastLogTerm</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">[</span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">].</span><span class="nx">Term</span> <span class="w"> </span><span class="nx">logLen</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span> <span class="w"> </span><span class="nx">logOk</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">req</span><span class="p">.</span><span class="nx">LastLogTerm</span><span class="w"> </span><span class="p">&gt;</span><span class="w"> </span><span class="nx">lastLogTerm</span><span class="w"> </span><span class="o">||</span> <span class="w"> </span><span class="p">(</span><span class="nx">req</span><span class="p">.</span><span class="nx">LastLogTerm</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">lastLogTerm</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">req</span><span class="p">.</span><span class="nx">LastLogIndex</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="nx">logLen</span><span class="p">)</span> <span class="w"> </span><span class="nx">grant</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">req</span><span class="p">.</span><span class="nx">Term</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">currentTerm</span><span class="w"> </span><span class="o">&amp;&amp;</span> <span class="w"> </span><span class="nx">logOk</span><span class="w"> </span><span class="o">&amp;&amp;</span> <span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">getVotedFor</span><span class="p">()</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">getVotedFor</span><span class="p">()</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">req</span><span class="p">.</span><span class="nx">CandidateId</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">grant</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debugf</span><span class="p">(</span><span class="s">&quot;Voted for %d.&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">req</span><span class="p">.</span><span class="nx">CandidateId</span><span class="p">)</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">setVotedFor</span><span class="p">(</span><span class="nx">req</span><span class="p">.</span><span class="nx">CandidateId</span><span class="p">)</span> <span class="w"> </span><span class="nx">rsp</span><span class="p">.</span><span class="nx">VoteGranted</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">true</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">resetElectionTimeout</span><span class="p">()</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">persist</span><span class="p">(</span><span class="kc">false</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debugf</span><span class="p">(</span><span class="s">&quot;Not granting vote request from %d.&quot;</span><span class="p">,</span><span class="w"> </span><span class="o">+</span><span class="nx">req</span><span class="p">.</span><span class="nx">CandidateId</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <p>Lastly, we need to address how the candidate who sent out vote requests actually becomes the leader.</p> <h4 id="<code>s.becomeleader()</code>"><code>s.becomeLeader()</code></h4><p>This is a relatively simple method. If we have a quorum of votes, we become the leader!</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">becomeLeader</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Lock</span><span class="p">()</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Unlock</span><span class="p">()</span> <span class="w"> </span><span class="nx">quorum</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">)</span><span class="o">/</span><span class="mi">2</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">votedFor</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">id</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">quorum</span><span class="w"> </span><span class="p">&gt;</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">quorum</span><span class="o">--</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>There is a bit of bookkeeping we need to do like resetting <code>nextIndex</code> and <code>matchIndex</code> for each server (noted in Figure 2). And we also need to append a blank entry for the new term.</p> <p class="note"> Despite the section quoted below in code, I still don't understand why this blank entry is necessary. </p><div class="highlight"><pre><span></span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">quorum</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Reset all cluster state</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">nextIndex</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span> <span class="w"> </span><span class="c1">// Yes, even matchIndex is reset. Figure 2</span> <span class="w"> </span><span class="c1">// from Raft shows both nextIndex and</span> <span class="w"> </span><span class="c1">// matchIndex are reset after every election.</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">matchIndex</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">0</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debug</span><span class="p">(</span><span class="s">&quot;New leader.&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">state</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">leaderState</span> <span class="w"> </span><span class="c1">// From Section 8 Client Interaction:</span> <span class="w"> </span><span class="c1">// &gt; First, a leader must have the latest information on</span> <span class="w"> </span><span class="c1">// &gt; which entries are committed. The Leader</span> <span class="w"> </span><span class="c1">// &gt; Completeness Property guarantees that a leader has</span> <span class="w"> </span><span class="c1">// &gt; all committed entries, but at the start of its</span> <span class="w"> </span><span class="c1">// &gt; term, it may not know which those are. To find out,</span> <span class="w"> </span><span class="c1">// &gt; it needs to commit an entry from its term. Raft</span> <span class="w"> </span><span class="c1">// &gt; handles this by having each leader commit a blank</span> <span class="w"> </span><span class="c1">// &gt; no-op entry into the log at the start of its term.</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">,</span><span class="w"> </span><span class="nx">Entry</span><span class="p">{</span><span class="nx">Term</span><span class="p">:</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">currentTerm</span><span class="p">,</span><span class="w"> </span><span class="nx">Command</span><span class="p">:</span><span class="w"> </span><span class="kc">nil</span><span class="p">})</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">persist</span><span class="p">(</span><span class="kc">true</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span> <span class="w"> </span><span class="c1">// Triggers s.appendEntries() in the next tick of the</span> <span class="w"> </span><span class="c1">// main state loop.</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">heartbeatTimeout</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Now</span><span class="p">()</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>And we're done with elections!</p> <p>When I was working on this for the first time, I just stopped here and made sure I could get to a stable leader quickly. If it takes more than 1 term to establish a leader when you run three servers in the cluster on localhost, you've probably got a bug.</p> <p>In an ideal environment (which three processes on one machine most likely is), leadership should be established quite quickly and without many term changes. As the environment gets more adversarial (e.g. processes crash frequently or network latency is high and variable), leadership (and log replication) will take longer.</p> <p class="note"> But just because we have leader election working when there are no logs does not mean we'll have it working when we introduce log replication since parts of voting depend on log analysis. <br /> I had leader election working at one time but then it broke when I got log replication working until I found some more bugs in leader election and fixed them. Of course, there may still be bugs even now. </p><h3 id="log-replication">Log replication</h3><p>I'll break up log replication into four major pieces:</p> <ol> <li>User submits a message to the leader to be replicated: <code>s.Apply()</code>.</li> <li>The leader sends uncommitted messages (messages from <code>nextIndex</code>) to all followers: <code>s.appendEntries()</code>.</li> <li>A follower receives a <code>AppendEntriesRequest</code> and stores new messages if appropriate, letting the leader know when it does store the messages: <code>s.HandleAppendEntriesRequest()</code>.</li> <li>The leader tries to update <code>commitIndex</code> for the last uncommitted message by seeing if it's been replicated on a quorum of servers: <code>s.advanceCommitIndex()</code>.</li> </ol> <p>Let's dig in in that order.</p> <h4 id="<code>s.apply()</code>"><code>s.Apply()</code></h4><p>This is the entry point for a user of the cluster to attempt to get messages replicated into the cluster.</p> <p>It must be called on the current leader of the cluster. In the future the failure response might include the current leader. Or the user could submit messages in parallel to all nodes in the cluster and ignore <code>ErrApplyToLeader</code>. In the meantime we just assume the user can figure out which server in the cluster is the leader.</p> <div class="highlight"><pre><span></span><span class="kd">var</span><span class="w"> </span><span class="nx">ErrApplyToLeader</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">errors</span><span class="p">.</span><span class="nx">New</span><span class="p">(</span><span class="s">&quot;Cannot apply message to follower, apply to leader.&quot;</span><span class="p">)</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">Apply</span><span class="p">(</span><span class="nx">commands</span><span class="w"> </span><span class="p">[][]</span><span class="kt">byte</span><span class="p">)</span><span class="w"> </span><span class="p">([]</span><span class="nx">ApplyResult</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Lock</span><span class="p">()</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">state</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">leaderState</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Unlock</span><span class="p">()</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">ErrApplyToLeader</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debugf</span><span class="p">(</span><span class="s">&quot;Processing %d new entry!&quot;</span><span class="p">,</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">commands</span><span class="p">))</span> </pre></div> <p>Next we'll store the message in the leader's log along with a Go channel that we must block on for the result of applying the message in the state machine after the message has been committed to the cluster.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">resultChans</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">make</span><span class="p">([]</span><span class="kd">chan</span><span class="w"> </span><span class="nx">ApplyResult</span><span class="p">,</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">commands</span><span class="p">))</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">command</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">commands</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">resultChans</span><span class="p">[</span><span class="nx">i</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">make</span><span class="p">(</span><span class="kd">chan</span><span class="w"> </span><span class="nx">ApplyResult</span><span class="p">)</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">,</span><span class="w"> </span><span class="nx">Entry</span><span class="p">{</span> <span class="w"> </span><span class="nx">Term</span><span class="p">:</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">currentTerm</span><span class="p">,</span> <span class="w"> </span><span class="nx">Command</span><span class="p">:</span><span class="w"> </span><span class="nx">command</span><span class="p">,</span> <span class="w"> </span><span class="nx">result</span><span class="p">:</span><span class="w"> </span><span class="nx">resultChans</span><span class="p">[</span><span class="nx">i</span><span class="p">],</span> <span class="w"> </span><span class="p">})</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">persist</span><span class="p">(</span><span class="kc">true</span><span class="p">,</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">commands</span><span class="p">))</span> </pre></div> <p>Then we kick off the replication process (this will not block).</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debug</span><span class="p">(</span><span class="s">&quot;Waiting to be applied!&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Unlock</span><span class="p">()</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">appendEntries</span><span class="p">()</span> </pre></div> <p>And then we block until we receive results from each of the channels we created.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="c1">// TODO: What happens if this takes too long?</span> <span class="w"> </span><span class="nx">results</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">make</span><span class="p">([]</span><span class="nx">ApplyResult</span><span class="p">,</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">commands</span><span class="p">))</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">wg</span><span class="w"> </span><span class="nx">sync</span><span class="p">.</span><span class="nx">WaitGroup</span> <span class="w"> </span><span class="nx">wg</span><span class="p">.</span><span class="nx">Add</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">commands</span><span class="p">))</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">ch</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">resultChans</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">go</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="nx">i</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="kd">chan</span><span class="w"> </span><span class="nx">ApplyResult</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">results</span><span class="p">[</span><span class="nx">i</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="o">&lt;-</span><span class="nx">c</span> <span class="w"> </span><span class="nx">wg</span><span class="p">.</span><span class="nx">Done</span><span class="p">()</span> <span class="w"> </span><span class="p">}(</span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">ch</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">wg</span><span class="p">.</span><span class="nx">Wait</span><span class="p">()</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">results</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <p>The interesting thing here is that appending entries is detached from the messages we just received. <code>s.appendEntries()</code> will probably include at least the messages we just appended to our log, but it might include more too if some servers are not very up-to-date. It may even include less than the messages we append to our log since we'll restrict the number of entries to send at one time so we keep latency down.</p> <h4 id="<code>s.appendentries()</code>"><code>s.appendEntries()</code></h4><p>This is the meat of log replication on the leader side. We send unreplicated messages to each other server in the cluster.</p> <p>By again referring back to Figure 2 from the Raft paper we can see how to model the request vote request and response. Let's turn that into some Go types too.</p> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">AppendEntriesRequest</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">RPCMessage</span> <span class="w"> </span><span class="c1">// So follower can redirect clients</span> <span class="w"> </span><span class="nx">LeaderId</span><span class="w"> </span><span class="kt">uint64</span> <span class="w"> </span><span class="c1">// Index of log entry immediately preceding new ones</span> <span class="w"> </span><span class="nx">PrevLogIndex</span><span class="w"> </span><span class="kt">uint64</span> <span class="w"> </span><span class="c1">// Term of prevLogIndex entry</span> <span class="w"> </span><span class="nx">PrevLogTerm</span><span class="w"> </span><span class="kt">uint64</span> <span class="w"> </span><span class="c1">// Log entries to store. Empty for heartbeat.</span> <span class="w"> </span><span class="nx">Entries</span><span class="w"> </span><span class="p">[]</span><span class="nx">Entry</span> <span class="w"> </span><span class="c1">// Leader&#39;s commitIndex</span> <span class="w"> </span><span class="nx">LeaderCommit</span><span class="w"> </span><span class="kt">uint64</span> <span class="p">}</span> <span class="kd">type</span><span class="w"> </span><span class="nx">AppendEntriesResponse</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">RPCMessage</span> <span class="w"> </span><span class="c1">// true if follower contained entry matching prevLogIndex and</span> <span class="w"> </span><span class="c1">// prevLogTerm</span> <span class="w"> </span><span class="nx">Success</span><span class="w"> </span><span class="kt">bool</span> <span class="p">}</span> </pre></div> <p>For the method itself, we start optimistically sending no entries and decrement <code>nextIndex</code> for each server as the server fails to replicate messages. This means that we might eventually end up sending the entire log to one or all servers.</p> <p>We'll set a max number of entries to send per request so we avoid unbounded latency as followers store entries to disk. But we still want to send a large batch so that we amortize the cost of <code>fsync</code>.</p> <div class="highlight"><pre><span></span><span class="kd">const</span><span class="w"> </span><span class="nx">MAX_APPEND_ENTRIES_BATCH</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">8</span><span class="nx">_000</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">appendEntries</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Don&#39;t need to send message to self</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">clusterIndex</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">continue</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">go</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="nx">i</span><span class="w"> </span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Lock</span><span class="p">()</span> <span class="w"> </span><span class="nx">next</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">nextIndex</span> <span class="w"> </span><span class="nx">prevLogIndex</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">next</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span> <span class="w"> </span><span class="nx">prevLogTerm</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">[</span><span class="nx">prevLogIndex</span><span class="p">].</span><span class="nx">Term</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">entries</span><span class="w"> </span><span class="p">[]</span><span class="nx">Entry</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">nextIndex</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debugf</span><span class="p">(</span><span class="s">&quot;len: %d, next: %d, server: %d&quot;</span><span class="p">,</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">),</span><span class="w"> </span><span class="nx">next</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">Id</span><span class="p">)</span> <span class="w"> </span><span class="nx">entries</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">[</span><span class="nx">next</span><span class="p">:]</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Keep latency down by only applying N at a time.</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">entries</span><span class="p">)</span><span class="w"> </span><span class="p">&gt;</span><span class="w"> </span><span class="nx">MAX_APPEND_ENTRIES_BATCH</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">entries</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">entries</span><span class="p">[:</span><span class="nx">MAX_APPEND_ENTRIES_BATCH</span><span class="p">]</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">lenEntries</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">entries</span><span class="p">))</span> <span class="w"> </span><span class="nx">req</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">AppendEntriesRequest</span><span class="p">{</span> <span class="w"> </span><span class="nx">RPCMessage</span><span class="p">:</span><span class="w"> </span><span class="nx">RPCMessage</span><span class="p">{</span> <span class="w"> </span><span class="nx">Term</span><span class="p">:</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">currentTerm</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="nx">LeaderId</span><span class="p">:</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">s</span><span class="p">.</span><span class="nx">clusterIndex</span><span class="p">].</span><span class="nx">Id</span><span class="p">,</span> <span class="w"> </span><span class="nx">PrevLogIndex</span><span class="p">:</span><span class="w"> </span><span class="nx">prevLogIndex</span><span class="p">,</span> <span class="w"> </span><span class="nx">PrevLogTerm</span><span class="p">:</span><span class="w"> </span><span class="nx">prevLogTerm</span><span class="p">,</span> <span class="w"> </span><span class="nx">Entries</span><span class="p">:</span><span class="w"> </span><span class="nx">entries</span><span class="p">,</span> <span class="w"> </span><span class="nx">LeaderCommit</span><span class="p">:</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">commitIndex</span><span class="p">,</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Unlock</span><span class="p">()</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">rsp</span><span class="w"> </span><span class="nx">AppendEntriesResponse</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debugf</span><span class="p">(</span><span class="s">&quot;Sending %d entries to %d for term %d.&quot;</span><span class="p">,</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">entries</span><span class="p">),</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">Id</span><span class="p">,</span><span class="w"> </span><span class="nx">req</span><span class="p">.</span><span class="nx">Term</span><span class="p">)</span> <span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">rpcCall</span><span class="p">(</span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Server.HandleAppendEntriesRequest&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">req</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">rsp</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Will retry next tick</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Now, as with every RPC request and response, we must check terms and potentially drop the message if it's outdated.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Lock</span><span class="p">()</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Unlock</span><span class="p">()</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">updateTerm</span><span class="p">(</span><span class="nx">rsp</span><span class="p">.</span><span class="nx">RPCMessage</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">dropStaleResponse</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">rsp</span><span class="p">.</span><span class="nx">Term</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">req</span><span class="p">.</span><span class="nx">Term</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">state</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">leaderState</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">dropStaleResponse</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Otherwise, if the message was successful, we'll update <code>matchIndex</code> (the last confirmed message stored on the follower) and <code>nextIndex</code> (the next likely message to send to the follower).</p> <p>If the message was not successful, we decrement <code>nextIndex</code>. Next time <code>s.appendEntries()</code> is called it will include one more previous message for this replica.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">rsp</span><span class="p">.</span><span class="nx">Success</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">prev</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">nextIndex</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">nextIndex</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">max</span><span class="p">(</span><span class="nx">req</span><span class="p">.</span><span class="nx">PrevLogIndex</span><span class="o">+</span><span class="nx">lenEntries</span><span class="o">+</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">matchIndex</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">nextIndex</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debugf</span><span class="p">(</span><span class="s">&quot;Message accepted for %d. Prev Index: %d, Next Index: %d, Match Index: %d.&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">Id</span><span class="p">,</span><span class="w"> </span><span class="nx">prev</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">nextIndex</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">matchIndex</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">nextIndex</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">max</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">nextIndex</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debugf</span><span class="p">(</span><span class="s">&quot;Forced to go back to %d for: %d.&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">nextIndex</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">Id</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}(</span><span class="nx">i</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>And we're done the leader side of append entries!</p> <h4 id="<code>s.handleappendentriesrequest()</code>"><code>s.HandleAppendEntriesRequest()</code></h4><p>Now for the follower side of log replication. This is, again, an RPC handler that could be called at any moment. So we need to potentially update the <code>term</code> (and transition to follower).</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">HandleAppendEntriesRequest</span><span class="p">(</span><span class="nx">req</span><span class="w"> </span><span class="nx">AppendEntriesRequest</span><span class="p">,</span><span class="w"> </span><span class="nx">rsp</span><span class="w"> </span><span class="o">*</span><span class="nx">AppendEntriesResponse</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Lock</span><span class="p">()</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Unlock</span><span class="p">()</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">updateTerm</span><span class="p">(</span><span class="nx">req</span><span class="p">.</span><span class="nx">RPCMessage</span><span class="p">)</span> </pre></div> <p>"Hidden" in the "Candidates (§5.2):" section of Figure 2 is an additional rule about:</p> <blockquote><p>If AppendEntries RPC received from new leader: convert to follower</p> </blockquote> <p>So we also need to handle that here. And if we're still not a follower, we'll return immediately.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="c1">// From Candidates (§5.2) in Figure 2</span> <span class="w"> </span><span class="c1">// If AppendEntries RPC received from new leader: convert to follower</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">req</span><span class="p">.</span><span class="nx">Term</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">currentTerm</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">state</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">candidateState</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">state</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">followerState</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">rsp</span><span class="p">.</span><span class="nx">Term</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">currentTerm</span> <span class="w"> </span><span class="nx">rsp</span><span class="p">.</span><span class="nx">Success</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">state</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">followerState</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debugf</span><span class="p">(</span><span class="s">&quot;Non-follower cannot append entries.&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Next, we also return early if the request term is less than our own. This would represent an old request.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">req</span><span class="p">.</span><span class="nx">Term</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">currentTerm</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debugf</span><span class="p">(</span><span class="s">&quot;Dropping request from old leader %d: term %d.&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">req</span><span class="p">.</span><span class="nx">LeaderId</span><span class="p">,</span><span class="w"> </span><span class="nx">req</span><span class="p">.</span><span class="nx">Term</span><span class="p">)</span> <span class="w"> </span><span class="c1">// Not a valid leader.</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Now, finally, we know we're receiving a request from a valid leader. So we need to immediately bump the election timeout.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="c1">// Valid leader so reset election.</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">resetElectionTimeout</span><span class="p">()</span> </pre></div> <p>Then we do the log comparison to see if we can add the entries sent from this request. Specifically, we make sure that our log at <code>req.PrevLogIndex</code> exists and has the same term as <code>req.PrevLogTerm</code>.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">logLen</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">))</span> <span class="w"> </span><span class="nx">validPreviousLog</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">req</span><span class="p">.</span><span class="nx">PrevLogIndex</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="cm">/* This is the induction step */</span><span class="w"> </span><span class="o">||</span> <span class="w"> </span><span class="p">(</span><span class="nx">req</span><span class="p">.</span><span class="nx">PrevLogIndex</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nx">logLen</span><span class="w"> </span><span class="o">&amp;&amp;</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">[</span><span class="nx">req</span><span class="p">.</span><span class="nx">PrevLogIndex</span><span class="p">].</span><span class="nx">Term</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">req</span><span class="p">.</span><span class="nx">PrevLogTerm</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">validPreviousLog</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debug</span><span class="p">(</span><span class="s">&quot;Not a valid log.&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Next, we've got valid entries that we need to add to our log. This implementation is a little more complex because we'll make use of Go slice capacity so that <code>append()</code> never allocates.</p> <p>Importantly, we must truncate the log if a new entry ever conflicts with an existing one:</p> <blockquote><p>If an existing entry conflicts with a new one (same index but different terms), delete the existing entry and all that follow it (§5.3)</p> </blockquote> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">next</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">req</span><span class="p">.</span><span class="nx">PrevLogIndex</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span> <span class="w"> </span><span class="nx">nNewEntries</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">next</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nx">next</span><span class="o">+</span><span class="nb">uint64</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">req</span><span class="p">.</span><span class="nx">Entries</span><span class="p">));</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">e</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">req</span><span class="p">.</span><span class="nx">Entries</span><span class="p">[</span><span class="nx">i</span><span class="o">-</span><span class="nx">next</span><span class="p">]</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nb">cap</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">newTotal</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">next</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">req</span><span class="p">.</span><span class="nx">Entries</span><span class="p">))</span> <span class="w"> </span><span class="c1">// Second argument must actually be `i`</span> <span class="w"> </span><span class="c1">// not `0` otherwise the copy after this</span> <span class="w"> </span><span class="c1">// doesn&#39;t work.</span> <span class="w"> </span><span class="c1">// Only copy until `i`, not `newTotal` since</span> <span class="w"> </span><span class="c1">// we&#39;ll continue appending after this.</span> <span class="w"> </span><span class="nx">newLog</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">make</span><span class="p">([]</span><span class="nx">Entry</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">newTotal</span><span class="o">*</span><span class="mi">2</span><span class="p">)</span> <span class="w"> </span><span class="nb">copy</span><span class="p">(</span><span class="nx">newLog</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">)</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">newLog</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">))</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">Term</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">e</span><span class="p">.</span><span class="nx">Term</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">prevCap</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">cap</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">)</span> <span class="w"> </span><span class="c1">// If an existing entry conflicts with a new</span> <span class="w"> </span><span class="c1">// one (same index but different terms),</span> <span class="w"> </span><span class="c1">// delete the existing entry and all that</span> <span class="w"> </span><span class="c1">// follow it (§5.3)</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">[:</span><span class="nx">i</span><span class="p">]</span> <span class="w"> </span><span class="nx">Server_assert</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Capacity remains the same while we truncated.&quot;</span><span class="p">,</span><span class="w"> </span><span class="nb">cap</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">),</span><span class="w"> </span><span class="nx">prevCap</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debugf</span><span class="p">(</span><span class="s">&quot;Appending entry: %s. At index: %d.&quot;</span><span class="p">,</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">e</span><span class="p">.</span><span class="nx">Command</span><span class="p">),</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">))</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">Server_assert</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Existing log is the same as new log&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">Term</span><span class="p">,</span><span class="w"> </span><span class="nx">e</span><span class="p">.</span><span class="nx">Term</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">,</span><span class="w"> </span><span class="nx">e</span><span class="p">)</span> <span class="w"> </span><span class="nx">Server_assert</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Length is directly related to the index.&quot;</span><span class="p">,</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">)),</span><span class="w"> </span><span class="nx">i</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span> <span class="w"> </span><span class="nx">nNewEntries</span><span class="o">++</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Finally, we update the server's local <code>commitIndex</code> to the min of <code>req.LeaderCommit</code> and our own log length.</p> <p>And finally we persist all these changes and mark the response as successful.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">req</span><span class="p">.</span><span class="nx">LeaderCommit</span><span class="w"> </span><span class="p">&gt;</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">commitIndex</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">commitIndex</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">min</span><span class="p">(</span><span class="nx">req</span><span class="p">.</span><span class="nx">LeaderCommit</span><span class="p">,</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">))</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">persist</span><span class="p">(</span><span class="nx">nNewEntries</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">nNewEntries</span><span class="p">)</span> <span class="w"> </span><span class="nx">rsp</span><span class="p">.</span><span class="nx">Success</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">true</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <p>So the combined behavior of the leader and follower when replicating is that a follower not in sync with the leader may eventually go down to the beginning of the log so the leader and follower have some first N messages of the log that match.</p> <h4 id="<code>s.advancecommitindex()</code>"><code>s.advanceCommitIndex()</code></h4><p>Now when not just one follower but a quorum of followers all have a matching first N messages, the leader can advance the cluster's <code>commitIndex</code>.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">advanceCommitIndex</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Lock</span><span class="p">()</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Unlock</span><span class="p">()</span> <span class="w"> </span><span class="c1">// Leader can update commitIndex on quorum.</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">state</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">leaderState</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">lastLogIndex</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">lastLogIndex</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p">&gt;</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">commitIndex</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">--</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">quorum</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">)</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">j</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">quorum</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">break</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">isLeader</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">j</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">clusterIndex</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">cluster</span><span class="p">[</span><span class="nx">j</span><span class="p">].</span><span class="nx">matchIndex</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">isLeader</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">quorum</span><span class="o">--</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">quorum</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">commitIndex</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">i</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debugf</span><span class="p">(</span><span class="s">&quot;New commit index: %d.&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span> <span class="w"> </span><span class="k">break</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>And for every state a server might be in, if there are messages committed but not applied, we'll apply one here. And importantly, we'll pass the result back to the message's result channel if it exists, so that <code>s.Apply()</code> can learn about the result.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">lastApplied</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">commitIndex</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">log</span><span class="p">[</span><span class="nx">s</span><span class="p">.</span><span class="nx">lastApplied</span><span class="p">]</span> <span class="w"> </span><span class="c1">// len(log.Command) == 0 is a noop committed by the leader.</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">log</span><span class="p">.</span><span class="nx">Command</span><span class="p">)</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debugf</span><span class="p">(</span><span class="s">&quot;Entry applied: %d.&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">lastApplied</span><span class="p">)</span> <span class="w"> </span><span class="c1">// TODO: what if Apply() takes too long?</span> <span class="w"> </span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">statemachine</span><span class="p">.</span><span class="nx">Apply</span><span class="p">(</span><span class="nx">log</span><span class="p">.</span><span class="nx">Command</span><span class="p">)</span> <span class="w"> </span><span class="c1">// Will be nil for follower entries and for no-op entries.</span> <span class="w"> </span><span class="c1">// Not nil for all user submitted messages.</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">result</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">result</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="nx">ApplyResult</span><span class="p">{</span> <span class="w"> </span><span class="nx">Result</span><span class="p">:</span><span class="w"> </span><span class="nx">res</span><span class="p">,</span> <span class="w"> </span><span class="nx">Error</span><span class="p">:</span><span class="w"> </span><span class="nx">err</span><span class="p">,</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">lastApplied</span><span class="o">++</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <h3 id="heartbeats">Heartbeats</h3><p>Heartbeats combine log replication and leader election. Heartbeats stave off leader election (follower timeouts). And heartbeats also bring followers up-to-date if they are behind.</p> <p>And it's a simple method. If it's time to heartbeat, we call <code>s.appendEntries()</code>. That's it.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">Server</span><span class="p">)</span><span class="w"> </span><span class="nx">heartbeat</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Lock</span><span class="p">()</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">mu</span><span class="p">.</span><span class="nx">Unlock</span><span class="p">()</span> <span class="w"> </span><span class="nx">timeForHeartbeat</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Now</span><span class="p">().</span><span class="nx">After</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">heartbeatTimeout</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">timeForHeartbeat</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">heartbeatTimeout</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Now</span><span class="p">().</span><span class="nx">Add</span><span class="p">(</span><span class="nx">time</span><span class="p">.</span><span class="nx">Duration</span><span class="p">(</span><span class="nx">s</span><span class="p">.</span><span class="nx">heartbeatMs</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Millisecond</span><span class="p">)</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">debug</span><span class="p">(</span><span class="s">&quot;Sending heartbeat&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">appendEntries</span><span class="p">()</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>The reason this staves off leader election is because any number of entries (0 or N) will come from a valid leader and will thus cause the followers to reset their election timeout.</p> <p>And that's the entirety of (the basics of) Raft.</p> <p>There are probably bugs.</p> <h3 id="running-kvapi">Running kvapi</h3><p>Now let's run the key-value API.</p> <div class="highlight"><pre><span></span><span class="gp">$ </span><span class="nb">cd</span><span class="w"> </span>cmd/kvapi<span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span>go<span class="w"> </span>build <span class="gp">$ </span>rm<span class="w"> </span>*.dat </pre></div> <h4 id="terminal-1">Terminal 1</h4><div class="highlight"><pre><span></span><span class="gp">$ </span>./kvapi<span class="w"> </span>--node<span class="w"> </span><span class="m">0</span><span class="w"> </span>--http<span class="w"> </span>:2020<span class="w"> </span>--cluster<span class="w"> </span><span class="s2">&quot;0,:3030;1,:3031;2,:3032&quot;</span> </pre></div> <h4 id="terminal-2">Terminal 2</h4><div class="highlight"><pre><span></span><span class="gp">$ </span>./kvapi<span class="w"> </span>--node<span class="w"> </span><span class="m">1</span><span class="w"> </span>--http<span class="w"> </span>:2021<span class="w"> </span>--cluster<span class="w"> </span><span class="s2">&quot;0,:3030;1,:3031;2,:3032&quot;</span> </pre></div> <h4 id="terminal-3">Terminal 3</h4><div class="highlight"><pre><span></span><span class="gp">$ </span>./kvapi<span class="w"> </span>--node<span class="w"> </span><span class="m">2</span><span class="w"> </span>--http<span class="w"> </span>:2022<span class="w"> </span>--cluster<span class="w"> </span><span class="s2">&quot;0,:3030;1,:3031;2,:3032&quot;</span> </pre></div> <h4 id="terminal-4">Terminal 4</h4><p>Remember that requests will go through the leader (except for if we turn that off in the <code>/get</code> request). So you'll have to try sending a message to each server until you find the leader.</p> <p>To set a key:</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>curl<span class="w"> </span>http://localhost:2020/set?key<span class="o">=</span>y<span class="p">&amp;</span><span class="nv">value</span><span class="o">=</span>hello </pre></div> <p>To get a key:</p> <div class="highlight"><pre><span></span><span class="gp">$ </span>curl<span class="w"> </span>http://localhost:2020/get<span class="se">\?</span>key<span class="se">\=</span>y </pre></div> <p>And that's that! Try killing a server and restarting it. A new leader will be elected so you'll need to find the right one to send requests to again. But all existing entries should still be there.</p> <h3 id="a-test-rig">A test rig</h3><p>I won't cover the <a href="https://github.com/eatonphil/goraft/blob/main/cmd/sim/main.go">implementation of my test rig</a> in this post but I will describe it.</p> <p>It's nowhere near Jepsen but it does have a specific focus:</p> <ol> <li>Can the cluster elect a leader?</li> <li>Can the cluster store logs correctly?</li> <li>Can the cluster of three nodes tolerate one node down?</li> <li>How fast can it store N messages?</li> <li>Are messages recovered correctly when the nodes shut down and start back up?</li> <li>If a node's logs are deleted, is the log for that node recovered after it is restarted?</li> </ol> <p>This implementation passes these tests and handles around 20k-40k entries/second.</p> <h3 id="considerations">Considerations</h3><p>This was quite a challenging project. Normally when I hack on stuff like this I have TV (The Simpsons) on in the background. It's sort of dumb but this was the first project where I absolutely could not focus with that background noise.</p> <p>There are a tedious number of conditions and I am not sure I got them all (right). Numerous ways for subtle bugs.</p> <h4 id="race-conditions-and-deadlocks">Race conditions and deadlocks</h4><p>It's very easy to program in race conditions. Thankfully Go has the <code>-race</code> flag that detects this. This makes sure that you are locking read and write access to shared variables when necessary.</p> <p>On the other side of race conditions, Go does not help you out with: deadlocks. Once you've got locks in place for shared variables, you need to make sure you're releasing the locks appropriately too.</p> <p>Thankfully someone wrote a swap-in replacement for the Go <code>sync</code> package called <a href="https://github.com/sasha-s/go-deadlock">go-deadlock</a>. When you import this package instead of the default <code>sync</code> package, it will panic and give you a stacktrace when it thinks you hit a deadlock.</p> <p>Sometimes it thinks you hit a deadlock because a method that needs a lock takes too long. Sometimes that time it takes is legitimate (or something you haven't optimized yet). But actually its default of <code>30s</code> is not really aggressive at all.</p> <p>So I normally set the deadlock timeout to <code>2s</code> and eventually would like to make that more like <code>100ms</code>:</p> <div class="highlight"><pre><span></span>sync.Opts.DeadlockTimeout = 2000 * time.Millisecond </pre></div> <p>It's mostly the <code>persist()</code> function that causes <code>go-deadlock</code> to think there's a deadlock because it tries to synchronously write a bunch of data to disk.</p> <h5 id="<code>go-deadlock</code>-is-slow"><code>go-deadlock</code> is slow</h5><p>The <code>go-deadlock</code> package is incredibly useful. But don't forget to turn it off for benchmarks. With it on I get around 4-8k entries/second. With it off I get around 20k-40k entries/second.</p> <h4 id="unbounded-memory">Unbounded memory</h4><p>Another issue in this implementation is that the log keeps growing indefinitely <em>and</em> the entire log is duplicated in memory.</p> <p>There are two ways to deal with that:</p> <ol> <li>Implement Raft snapshotting so the log can be truncated safely.</li> <li>Keep only some number of entries in memory (say, 1 million) and read from disk as needed when logs need to be verified. In ideal operation this would never happen since ideally all servers are always on, never miss entries, and just keep appending. But "ideal" won't always happen.</li> </ol> <p>Similarly, there is unbounded and unreused channel creation for notifying <code>s.Apply()</code> when the user-submitted message(s) finish.</p> <h4 id="net/rpc-and-encoding/gob">net/rpc and encoding/gob</h4><p>In the <code>persist()</code> section above I already mentioned how I prototyped this using Go's builtin gob encoding. And I mentioned how inefficient it was. It's also pretty slow and I learned that because <code>net/rpc</code> uses it and after everything I did <code>net/rpc</code> started to be the bottleneck in my benchmarks. This isn't incredibly surprising.</p> <p>So a future version of this code might implement its own protocol and own encoding (like we did for disk) on top of TCP rather than use <code>net/rpc</code>.</p> <h4 id="jepsen">Jepsen</h4><p>Everyone wants to know how a distributed algorithm does against <a href="https://github.com/jepsen-io/jepsen">Jepsen</a>, which tests linearizability of distributed systems in the face of network and process faults.</p> <p>But the setup is not trivial so I haven't hooked it up to this project yet. This would be a good area for future work.</p> <h4 id="election-timeout-and-the-environment">Election timeout and the environment</h4><p>One thing I noticed as I was trying out alternatives to <code>net/rpc</code> (alternatives that injected latency to simulate a bad environment) is that election timeouts should probably be tuned with latency of the cluster in mind.</p> <p>If the election timeout is every <code>300ms</code> but the latency of the cluster is near <code>1s</code>, you're going to have non-stop leader election.</p> <p>When I adjusted the election timeout to be every <code>2s</code> when the latency of the cluster is near <code>1s</code>, everything was fine. Maybe this means there's a bug in my code but I don't think so.</p> <h4 id="client-request-serial-identifier">Client request serial identifier</h4><p>One major part of the Raft protocol I did not cover is that the client is supposed to send a serial identifier for each message sent to the cluster. This is to ensure that messages are not accidentally duplicated at any level of the entire software stack.</p> <p><a href="https://web.stanford.edu/~ouster/cgi-bin/papers/OngaroPhD.pdf">Diego Ongaro's thesis</a> goes into more detail about this than the Raft paper. Search in that PDF for "session".</p> <p>Again I just completely ignored the possibility of duplicate messages in this implementation so far.</p> <h3 id="references">References</h3><p>Finally, I could not have done this without a bunch of internet help. This project took me about 7 months in total. The first 5 months I was trying to figure it out mostly on my own, just looking at the Raft paper.</p> <p>The biggest breakthrough came from discovering the author of Raft's TLA+ spec for Raft. Formal methods sound scary but it was truly not too bad! This was the first "implementation" of Raft that was in a single file of code. And under 500 lines.</p> <p>Jack Vanlightly's guide to reading TLA+ helped a bunch.</p> <p>Finally, I had to peer at other implementations, especially to figure out locking and avoiding deadlocks.</p> <p>Here's everything that helped me out.</p> <ul> <li><a href="https://raft.github.io/raft.pdf">In Search of an Understandable Consensus Algorithm</a>: The Raft paper.</li> <li><a href="https://github.com/ongardie/raft.tla/blob/master/raft.tla">raft.tla</a>: Diego Ongaro's TLA+ spec for Raft.</li> <li>Jon Gjengset's <a href="https://thesquareplanet.com/blog/students-guide-to-raft/">Students' Guide to Raft</a></li> <li>Jack Vanlightly's <a href="https://medium.com/splunk-maas/detecting-bugs-in-data-infrastructure-using-formal-methods-704fde527c58">Detecting Bugs in Data Infrastructure using Formal Methods (TLA+ Series Part 1)</a>: An intro to TLA+.</li> </ul> <p>And useful implementations I looked at for inspiration and clarity.</p> <ul> <li>Hashicorp's <a href="https://github.com/hashicorp/raft">Raft implementation</a> in Go: Although it's often quite complicated to learn from since it actually is intended for production.</li> <li>Eli Bendersky's <a href="https://github.com/eliben/raft">Raft implementation</a> in Go: Although I got confused following it since it used signed integers and <code>-1</code> to represent base cases. Signed integers is a fair choice as far as I can tell, I just wanted to only use unsigned integers.</li> <li>Jing Yang's <a href="https://github.com/ditsing/ruaft">Raft implementation</a> in Rust: Although I find Rust hard to read.</li> </ul> <p>And I haven't tried these but they look cool:</p> <ul> <li><a href="https://jepsen.io/services#training">Raft course taught by Jepsen</a></li> <li><a href="https://www.dabeaz.com/raft.html">Raft course taught by David Beazley</a></li> </ul> <p>Cheers!</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I wrote about implementing Raft in Go. By far the most challenging project I&#39;ve worked on in spare time. About 7 months sporadically.<br><br>I&#39;m not an expert, and this is not intended to be used in production. I wanted a better background on the subject!<a href="https://t.co/EhyBuQ4pD3">https://t.co/EhyBuQ4pD3</a> <a href="https://t.co/vGhBbV1shf">pic.twitter.com/vGhBbV1shf</a></p>&mdash; Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1661720451616210944?ref_src=twsrc%5Etfw">May 25, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/2023-05-25-raft.htmlThu, 25 May 2023 00:00:00 +0000Two books I recommend to developershttp://notes.eatonphil.com/books-developers-should-read.html<p class="note"> Originally published on February 1, 2021. The original version included two books I don't think are actually so worthwhile. This list is down to two. I think that's a good thing actually. </p><p>These are the books I recommend to developers wanting to improve their skills as professional programmers because of high information density, believable premises/examples, and being well edited.</p> <p>You don't need to read books to improve as a developer but they are unparalleled in quickly helping you gain depth in a subject.</p> <h3 id="high-performance-browser-networking">High Performance Browser Networking</h3><p>If you deal with networks, you would probably benefit from this book. It is a thorough high level introduction to mobile networks, browser network protocols, and fundamentals of networking.</p> <h3 id="designing-data-intensive-applications">Designing Data-Intensive Applications</h3><p>If you use a database (including an in-memory array of items you search periodically) or if you build APIs, you would probably benefit from this book. A solid introduction to distributed computing, data transfer, indexing, etc.</p> <h3 id="that's-it!">That's it!</h3><p>Generic software books conspicuously not on this list for me:</p> <ul> <li>Clean Code</li> <li>JavaScript the Good Parts</li> <li>Design Patterns/Gang of Four</li> <li>Structure and Interpretation of Computer Programs</li> <li>A Philosophy of Software Design</li> </ul> <p>They're not all bad but give nowhere near as much return for the investment of your time.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Four books I recommend to professional developers wanting to improve their craft, and a few I&#39;d not<a href="https://t.co/1aTrfqZ9bd">https://t.co/1aTrfqZ9bd</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1356391931274756096?ref_src=twsrc%5Etfw">February 2, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/books-developers-should-read.htmlTue, 16 May 2023 00:00:00 +0000My favorite software subredditshttp://notes.eatonphil.com/high-quality-subreddits-you-should-be-following.html<p class="note"> Originally published on December 5, 2021. </p><p>If you are an experienced software developer whose only exposure to reddit is dank memes, <a href="https://reddit.com/r/programming">proggit</a> or even language-specific subreddits like <a href="https://reddit.com/r/python">/r/python</a>, you're missing out.</p> <p>What follows are my favorite subreddits in tech. My criteria is that:</p> <ul> <li>The subreddit topic is relevant to advancing as a programmer</li> <li>Posts generally go into good depth</li> <li>The comments stay on topic</li> <li>And the shit-posting is minimal</li> </ul> <p>This list isn't hard to guess at if you consider advanced topics in software. But I wanted to share because I think it's worth explicitly supporting high-quality subreddits.</p> <ul> <li><a href="https://www.reddit.com/r/EmuDev/">/r/EmuDev</a><ul> <li>My favorite sub of all. Also has a <a href="https://www.reddit.com/r/EmuDev/comments/9mop2q/join_the_official_remudev_chat_on_discord/">phenomenal Discord group</a>.</li> </ul> </li> <li><a href="https://www.reddit.com/r/programminglanguages">/r/ProgrammingLanguages</a><ul> <li>Focuses a little more on PLT topics (parsing techniques, syntax, type systems) than on compiling and interpreting techniques, but still good.</li> </ul> </li> <li><a href="https://www.reddit.com/r/DatabaseDevelopment/">/r/DatabaseDevelopment</a><ul> <li>All about database internals, which ends up involving a bunch of correctness and distributed systems stuff as well.</li> <li>Disclosure: I run this sub. It's at 2.7k+ members at time of publishing.</li> </ul> </li> <li><a href="https://www.reddit.com/r/ReverseEngineering/">/r/ReverseEngineering</a><ul> <li>The largest subreddit on this list but still has pretty good posts.</li> </ul> </li> <li><a href="https://www.reddit.com/r/esolangs/">/r/EsoLangs</a><ul> <li>One of the best/most fun intros to programming languages/compilers/interpreters is through languages like Brainfuck. This sub does a good job of keeping the fun going.</li> </ul> </li> <li><a href="https://www.reddit.com/r/Compilers/">/r/Compilers</a></li> <li><a href="https://www.reddit.com/r/GraphicsProgramming/">/r/GraphicsProgramming</a></li> </ul> <p>While some language subreddits are pretty good, they are more so a mixed bag than some of the topic-specific subreddits here. So they don't make my list, more on principle than anything else.</p> <p>If there is a good one already, send me it!</p> <h3 id="what-am-i-missing?">What am I missing?</h3><p>Am I missing other amazing subreddits? Just don't say language-specific ones. :)</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">It&#39;s an incorrect meme IMO that tech Reddit is low-quality. You just have to find the interesting subreddits.<br><br>I&#39;ve updated my list for 2023.<a href="https://t.co/OtM2tk8HOn">https://t.co/OtM2tk8HOn</a> <a href="https://t.co/ymyzChp0SO">pic.twitter.com/ymyzChp0SO</a></p>&mdash; Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1658567638090391555?ref_src=twsrc%5Etfw">May 16, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/high-quality-subreddits-you-should-be-following.htmlTue, 16 May 2023 00:00:00 +0000Errors and Zighttp://notes.eatonphil.com/errors-and-zig.html<p>At TigerBeetle these last few weeks I've been doing a mix of documenting client libraries, writing sample code for client libraries, and writing integration tests against the sample code.</p> <p>The client library documentation is generated with a Zig script. The sample code is integration tested with a Zig script. A bunch of Zig scripts.</p> <p>It's not the same <a href="https://github.com/tigerbeetledb/tigerbeetle/blob/main/docs/TIGER_STYLE.md">rigorous</a> sort of Zig as the main database. (We're generally more lax about scripts and test code.)</p> <p><em>And I'm specifically writing this post on my personal blog since my script code is not under incredible scrutiny.</em></p> <p>Furthermore, I'm still new to Zig. Since I'm still learning, there have been a few things that tripped me up.</p> <p>And now that I've written this out, I realize most of my stumbling was related to errors.</p> <h3 id="failure">Failure</h3><p>Lots of things in programs allocate memory. This sounds dumb and obvious but before programming Zig I did not appreciate how many operations I'm used to allocate memory. I've previously only programmed in GC languages that do the allocations behind the scenes.</p> <p>Furthermore, memory allocation can fail. Zig makes allocation failures explicit. So lots of things in Zig code need to handle failure.</p> <p>Selectively omitting error handling is not allowed:</p> <div class="highlight"><pre><span></span><span class="kr">const</span><span class="w"> </span><span class="n">std</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">&quot;std&quot;</span><span class="p">);</span> <span class="k">fn</span><span class="w"> </span><span class="n">thing</span><span class="p">(</span><span class="n">a</span><span class="o">:</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">Allocator</span><span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">fmt</span><span class="p">.</span><span class="n">allocPrint</span><span class="p">(</span><span class="n">a</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span> <span class="p">}</span> <span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">main</span><span class="p">()</span><span class="w"> </span><span class="o">!</span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">arena</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">heap</span><span class="p">.</span><span class="n">ArenaAllocator</span><span class="p">.</span><span class="n">init</span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">heap</span><span class="p">.</span><span class="n">page_allocator</span><span class="p">);</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">arena</span><span class="p">.</span><span class="n">deinit</span><span class="p">();</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">allocator</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">arena</span><span class="p">.</span><span class="n">allocator</span><span class="p">();</span> <span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">thing</span><span class="p">(</span><span class="n">allocator</span><span class="p">);</span> <span class="p">}</span> </pre></div> <p>Run <code>zig run test.zig</code>:</p> <div class="highlight"><pre><span></span>test.zig:4:23:<span class="w"> </span>error:<span class="w"> </span>error<span class="w"> </span>is<span class="w"> </span>ignored <span class="w"> </span>std.fmt.allocPrint<span class="o">(</span>a,<span class="w"> </span><span class="s2">&quot;&quot;</span>,<span class="w"> </span>.<span class="o">{})</span><span class="p">;</span> <span class="w"> </span>~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~ test.zig:4:23:<span class="w"> </span>note:<span class="w"> </span>consider<span class="w"> </span>using<span class="w"> </span><span class="s1">&#39;try&#39;</span>,<span class="w"> </span><span class="s1">&#39;catch&#39;</span>,<span class="w"> </span>or<span class="w"> </span><span class="s1">&#39;if&#39;</span> referenced<span class="w"> </span>by: <span class="w"> </span>main:<span class="w"> </span>test.zig:12:9 <span class="w"> </span>callMain:<span class="w"> </span>/home/phil/vendor/zig-linux-x86_64-0.11.0-dev.2213+515e1c93e/lib/std/start.zig:617:32 <span class="w"> </span>remaining<span class="w"> </span>reference<span class="w"> </span>traces<span class="w"> </span>hidden<span class="p">;</span><span class="w"> </span>use<span class="w"> </span><span class="s1">&#39;-freference-trace&#39;</span><span class="w"> </span>to<span class="w"> </span>see<span class="w"> </span>all<span class="w"> </span>reference<span class="w"> </span>traces </pre></div> <p>This ends up meaning lots of code like:</p> <div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">do_stuff</span><span class="p">(</span> <span class="w"> </span><span class="n">alloc</span><span class="o">:</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">Allocator</span><span class="p">,</span><span class="w"> </span><span class="c1">// Let&#39;s assume this is an arena allocator so I don&#39;t care about freeing.</span> <span class="w"> </span><span class="n">stuff</span><span class="o">:</span><span class="w"> </span><span class="n">Stuff</span><span class="p">,</span> <span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">([]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">alloc</span><span class="p">);</span> <span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">x</span><span class="p">.</span><span class="n">appendSlice</span><span class="p">(</span><span class="o">&amp;</span><span class="p">[</span><span class="n">_</span><span class="p">][]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">{</span> <span class="w"> </span><span class="s">&quot;first of something&quot;</span><span class="p">,</span> <span class="w"> </span><span class="s">&quot;one more&quot;</span><span class="p">,</span> <span class="w"> </span><span class="p">});</span> <span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">x</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">stuff</span><span class="p">.</span><span class="n">thing</span><span class="p">);</span> <span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">x</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="k">try</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">fmt</span><span class="p">.</span><span class="n">allocPrint</span><span class="p">(</span><span class="n">alloc</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;build some string {s}.</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">stuff</span><span class="p">.</span><span class="n">athing</span><span class="p">}));</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">other_stuff</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">fmt</span><span class="p">.</span><span class="n">allocPrint</span><span class="p">(</span><span class="n">alloc</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;things... {s}&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">blah</span><span class="p">});</span> <span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">do_other_stuff</span><span class="p">(</span><span class="n">x</span><span class="p">.</span><span class="n">items</span><span class="p">,</span><span class="w"> </span><span class="n">other_stuff</span><span class="p">);</span> <span class="p">}</span> </pre></div> <p>You have <code>try</code>-es all over the place.</p> <h3 id="limits-of-<code>try</code>">Limits of <code>try</code></h3><p>Now I don't have a problem with acknowledging that allocations can fail. At least outside of scripts. In scripts like I've been writing though I don't really care.</p> <p>Having all of those <code>try</code>-es is just extra typing all over the place.</p> <p>It would be nice if I could have instead done:</p> <div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">do_stuff</span><span class="p">(</span> <span class="w"> </span><span class="n">alloc</span><span class="o">:</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">Allocator</span><span class="p">,</span><span class="w"> </span><span class="c1">// Let&#39;s assume this is an arena allocator so I don&#39;t care about freeing.</span> <span class="w"> </span><span class="n">stuff</span><span class="o">:</span><span class="w"> </span><span class="n">Stuff</span><span class="p">,</span> <span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">([]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">alloc</span><span class="p">);</span> <span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">x</span><span class="p">.</span><span class="n">appendSlice</span><span class="p">(</span><span class="o">&amp;</span><span class="p">[</span><span class="n">_</span><span class="p">][]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">{</span> <span class="w"> </span><span class="s">&quot;first of something&quot;</span><span class="p">,</span> <span class="w"> </span><span class="s">&quot;one more&quot;</span><span class="p">,</span> <span class="w"> </span><span class="p">});</span> <span class="w"> </span><span class="n">x</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">stuff</span><span class="p">.</span><span class="n">thing</span><span class="p">);</span> <span class="w"> </span><span class="n">x</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">fmt</span><span class="p">.</span><span class="n">allocPrint</span><span class="p">(</span><span class="n">alloc</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;build some string {s}.</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">stuff</span><span class="p">.</span><span class="n">athing</span><span class="p">}));</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">other_stuff</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">fmt</span><span class="p">.</span><span class="n">allocPrint</span><span class="p">(</span><span class="n">alloc</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;things... {s}&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">blah</span><span class="p">});</span> <span class="w"> </span><span class="n">do_other_stuff</span><span class="p">(</span><span class="n">x</span><span class="p">.</span><span class="n">items</span><span class="p">,</span><span class="w"> </span><span class="n">other_stuff</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>But Zig's <code>try</code> doesn't work like that. I'm not sure why not. The Zig developers are sensible so I'm sure there's a good reason.</p> <p>Still, are there other options?</p> <h3 id="<code>catch-unreachable</code>"><code>catch unreachable</code></h3><p>So the problem isn't just that you have to acknowledge memory allocation failures but that these failures within every helper function need to be acknowledged by the caller of the helper function. Failures infiltrate the entire call tree.</p> <p>Now of course these potential failures would exist whether or not Zig exposed them. So I don't mean to say it's Zig's fault for exposing them.</p> <p>But you can avoid failure handling by instead of <code>try</code>-ing everything, mark error conditions as <code>unreachable</code>.</p> <div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">do_stuff</span><span class="p">(</span> <span class="w"> </span><span class="n">alloc</span><span class="o">:</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">Allocator</span><span class="p">,</span><span class="w"> </span><span class="c1">// Let&#39;s assume this is an arena allocator so I don&#39;t care about freeing.</span> <span class="w"> </span><span class="n">stuff</span><span class="o">:</span><span class="w"> </span><span class="n">Stuff</span><span class="p">,</span> <span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">([]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">alloc</span><span class="p">);</span> <span class="w"> </span><span class="n">x</span><span class="p">.</span><span class="n">appendSlice</span><span class="p">(</span><span class="o">&amp;</span><span class="p">[</span><span class="n">_</span><span class="p">][]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">{</span> <span class="w"> </span><span class="s">&quot;first of something&quot;</span><span class="p">,</span> <span class="w"> </span><span class="s">&quot;one more&quot;</span><span class="p">,</span> <span class="w"> </span><span class="p">})</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">unreachable</span><span class="p">;</span> <span class="w"> </span><span class="n">x</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">stuff</span><span class="p">.</span><span class="n">thing</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">unreachable</span><span class="p">;</span> <span class="w"> </span><span class="n">x</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">fmt</span><span class="p">.</span><span class="n">allocPrint</span><span class="p">(</span><span class="n">alloc</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;build some string {s}.</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">stuff</span><span class="p">.</span><span class="n">athing</span><span class="p">})</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">unreachable</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">unreachable</span><span class="p">;</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">other_stuff</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">fmt</span><span class="p">.</span><span class="n">allocPrint</span><span class="p">(</span><span class="n">alloc</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;things... {s}&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">blah</span><span class="p">})</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">unreachable</span><span class="p">;</span> <span class="w"> </span><span class="n">do_other_stuff</span><span class="p">(</span><span class="n">x</span><span class="p">.</span><span class="n">items</span><span class="p">,</span><span class="w"> </span><span class="n">other_stuff</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">unreachable</span><span class="p">;</span> <span class="p">}</span> </pre></div> <p>As you can see from the function signature, this function no longer returns any error at all. But it could possibly panic.</p> <p>Now in scripts, for things like memory allocations that can fail, I actually think it's reasonable to mark allocation failures as unreachable.</p> <p>But I took it a bit further. Using <code>@panic</code> or <code>unreachable</code> in general failure conditions.</p> <div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">run</span><span class="p">(</span><span class="n">alloc</span><span class="o">:</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">Allocator</span><span class="p">,</span><span class="w"> </span><span class="n">cmds</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ChildProcess</span><span class="p">.</span><span class="n">exec</span><span class="p">(.{</span> <span class="w"> </span><span class="p">.</span><span class="n">allocator</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">argv</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cmd</span><span class="p">,</span> <span class="w"> </span><span class="p">});</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">res</span><span class="p">.</span><span class="n">term</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">Exited</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">code</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">code</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">@panic</span><span class="p">(</span><span class="s">&quot;Expected command to succeed.&quot;</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="k">unreachable</span><span class="p">,</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <h3 id="handling-panics">Handling panics</h3><p>But there are some things that will fail quite frequently (like running subprocesses or interacting with the filesystem in general).</p> <p>Panicing (like what happens if <code>@panic()</code> <s>or `unreachable`</s> is hit) in these situations is all good until you have things that you want to get cleaned up.</p> <p class="note"> My <a href="https://matklad.github.io/">coworker</a> points out I'm wrongly conflating <code>unreachable</code> and <code>@panic()</code> since depending on the release mode, hitting <code>unreachable</code> is actually undefined behavior whereas <code>@panic()</code> is always a panic. </p><p>Panics don't trigger <code>defer</code> or <code>errdefer</code> statements. So if you have a script that starts a background process or creates a temporary directory, and if you panic in that script, the script won't be able to run <code>defer</code> steps to stop the background process or delete the temporary directory.</p> <p>There are panic handlers in Zig (not yet documented, Ctrl-f for "TODO: pub fn panic" in the <a href="https://ziglang.org/documentation/master/">Zig docs</a>. But I'd just be getting further from what seems sensible if I went in that direction.</p> <h3 id="zig-errors">Zig errors</h3><p>So I stopped panic-ing everywhere and switched to using real Zig errors, like:</p> <div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">run</span><span class="p">(</span><span class="n">alloc</span><span class="o">:</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">Allocator</span><span class="p">,</span><span class="w"> </span><span class="n">cmds</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ChildProcess</span><span class="p">.</span><span class="n">exec</span><span class="p">(.{</span> <span class="w"> </span><span class="p">.</span><span class="n">allocator</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">argv</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">cmd</span><span class="p">,</span> <span class="w"> </span><span class="p">});</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">res</span><span class="p">.</span><span class="n">term</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">Exited</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">code</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">code</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;Expected command to succeed.</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">error</span><span class="p">.</span><span class="n">RunCommandFailed</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="k">unreachable</span><span class="p">,</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>It's pretty sweet. You get to make up a new <code>error</code> enum wherever you'd like.</p> <p>It is unfortunate you can't (currently) include a payload with the error return value. There's an <a href="https://github.com/ziglang/zig/issues/2647">active issue discussing it</a>.</p> <p>But so far I've been able to work around that, as seen in that example above, by logging before returning an error. Since most of the time the payload you want to return is detailed information to provide context.</p> <p>This logging is fine in a CLI application but probably not everything you'd want in a library. I'm not sure.</p> <p>And now without panics, functions that deal with <code>error</code> enums and <code>try</code> work with <code>defer</code> and <code>errdefer</code> again! Cleanup of my background processes and temporary directories happens like I want.</p> <h3 id="handling-errors-with-<code>if</code>">Handling errors with <code>if</code></h3><p>Ok so now that I'm fully bought into Zig errors there were still a few more things that tripped me up.</p> <p>First is that you can handle errors a few ways. You already saw the first one with <code>try</code>.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">thingThatCouldFail</span><span class="p">();</span> </pre></div> <p>This will cause the function the statement is inside to short-circuit, returning immediately, if <code>thingThatCouldFail</code> has an error result.</p> <p>But then I wanted to retry a function that could fail in a loop after handling the error.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">x</span><span class="o">:</span><span class="w"> </span><span class="n">SomeType</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">somedefault</span><span class="p">;</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">tries</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">thingThatCouldFail</span><span class="p">())</span><span class="w"> </span><span class="o">|</span><span class="n">good_value</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">good_value</span><span class="p">;</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// do something that should fix it for the next time</span> <span class="w"> </span><span class="n">tries</span><span class="w"> </span><span class="o">-=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>But that isn't a real syntax. The Zig docs show an example of how you can use <code>if</code> with an <code>error</code> function:</p> <div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">doAThing</span><span class="p">(</span><span class="n">str</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kt">u8</span><span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">parseU64</span><span class="p">(</span><span class="n">str</span><span class="p">,</span><span class="w"> </span><span class="mi">10</span><span class="p">))</span><span class="w"> </span><span class="o">|</span><span class="n">number</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">doSomethingWithNumber</span><span class="p">(</span><span class="n">number</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">err</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">error</span><span class="p">.</span><span class="n">Overflow</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// handle overflow...</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="c1">// we promise that InvalidChar won&#39;t happen (or crash in debug mode if it does)</span> <span class="w"> </span><span class="k">error</span><span class="p">.</span><span class="n">InvalidChar</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="k">unreachable</span><span class="p">,</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>But I don't care about the error at this moment (maybe I should, but I don't right now).</p> <p>So I tried:</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">x</span><span class="o">:</span><span class="w"> </span><span class="n">SomeType</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">somedefault</span><span class="p">;</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">tries</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">thingThatCouldFail</span><span class="p">())</span><span class="w"> </span><span class="o">|</span><span class="n">good_value</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">good_value</span><span class="p">;</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// do something that should fix it for the next time</span> <span class="w"> </span><span class="n">tries</span><span class="w"> </span><span class="o">-=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>But that gives me an obscure type error.</p> <p>I was stumped here for a while until I decided to try the whole syntax in that example. And it turns out that at least the capture part is necessary at the parser layer:</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">x</span><span class="o">:</span><span class="w"> </span><span class="n">SomeType</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">somedefault</span><span class="p">;</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">tries</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">thingThatCouldFail</span><span class="p">())</span><span class="w"> </span><span class="o">|</span><span class="n">good_value</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">good_value</span><span class="p">;</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">err</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// do something that should fix it for the next time</span> <span class="w"> </span><span class="n">tries</span><span class="w"> </span><span class="o">-=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>And eventually I guessed an unnamed error variable might also work without the switch, and that was correct:</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">x</span><span class="o">:</span><span class="w"> </span><span class="n">SomeType</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">somedefault</span><span class="p">;</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">tries</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">thingThatCouldFail</span><span class="p">())</span><span class="w"> </span><span class="o">|</span><span class="n">good_value</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">good_value</span><span class="p">;</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="o">|</span><span class="n">_</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// do something that should fix it for the next time</span> <span class="w"> </span><span class="n">tries</span><span class="w"> </span><span class="o">-=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Nice!</p> <h3 id="<code>catch</code>-blocks"><code>catch</code> blocks</h3><p>One last thing that I was stumbling around with was that when you use <code>catch</code> with a function that returns an error or some non-void value, the catch must "return" a value of the same type as the function.</p> <p>The Zig docs show a simple example:</p> <div class="highlight"><pre><span></span><span class="kr">const</span><span class="w"> </span><span class="n">number</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">parseU64</span><span class="p">(</span><span class="n">str</span><span class="p">,</span><span class="w"> </span><span class="mi">10</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="mi">13</span><span class="p">;</span> </pre></div> <p>But I also use <code>catch</code> with blocks sometimes:</p> <div class="highlight"><pre><span></span><span class="kr">const</span><span class="w"> </span><span class="n">number</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">parseU64</span><span class="p">(</span><span class="n">str</span><span class="p">,</span><span class="w"> </span><span class="mi">10</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// do some more complex stuff, maybe log, who knows</span> <span class="p">};</span> </pre></div> <p>But that won't compile. So the "trick" is to combine Zig's <a href="https://ziglang.org/documentation/master/#Blocks">named blocks</a> with <code>catch</code>.</p> <div class="highlight"><pre><span></span><span class="kr">const</span><span class="w"> </span><span class="n">number</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">parseU64</span><span class="p">(</span><span class="n">str</span><span class="p">,</span><span class="w"> </span><span class="mi">10</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="n">blk</span><span class="o">:</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// do some more complex stuff, maybe log, who knows</span> <span class="w"> </span><span class="c1">// and then &quot;return&quot; a result</span> <span class="w"> </span><span class="k">break</span><span class="w"> </span><span class="o">:</span><span class="n">blk</span><span class="w"> </span><span class="mi">13</span><span class="p">;</span> <span class="p">};</span> </pre></div> <h3 id="contributing-to-zig-docs">Contributing to Zig docs</h3><p>I didn't want to write this post without offering some of my examples to the docs. While there's a dedicated effort around autodoc, the tool that builds docs for the standard library, I haven't yet stumbled on docs for contributing the main Zig docs.</p> <p>So I grepped in the Zig repo <code>git grep 'Blocks are expressions.'</code>, a phrase that showed up in the HTML docs, and found <code>doc/langref.html.in</code>.</p> <p>Then someone on the <a href="https://discord.gg/gxsFFjE">Zig Programming Language Discord</a> pointed me at running <code>zig build docs</code> in the repo root to generate the HTML.</p> <p>And now I've got a <a href="https://github.com/ziglang/zig/pull/15042">PR up</a>! We'll see what folks think.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I wrote a new post about error-handling and Zig, as I&#39;ve been doing a bunch of scripting with Zig recently.<br><br>I stumbled a few times so maybe that will be useful to you. And I was able to turn parts of my stumbling into a potential PR to the Zig docs. 🎉<a href="https://t.co/00RVWpodmd">https://t.co/00RVWpodmd</a> <a href="https://t.co/wENSEpj63A">pic.twitter.com/wENSEpj63A</a></p>&mdash; Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1638350047887622145?ref_src=twsrc%5Etfw">March 22, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/errors-and-zig.htmlTue, 21 Mar 2023 00:00:00 +0000Notes from Neal Gabler's Walt Disneyhttp://notes.eatonphil.com/2023-02-18-neal-gabler-walt-disney-notes.html<p>Disney was a celebrity by his mid-30s, Disney the company was famous by 1930s.</p> <p>Even though politically the 1930s was considered the decade of Roosevelt (elected President in 1933), culturally the 1930s was considered the decade of Mickey Mouse.</p> <p>Almost every new animation/filmmaking technique they tried, they would experiment with it in shorts (Silly Symphonies) before applying to big films like Snow White. Examples of this include:</p> <ul> <li>Multiple layers of animation moving independently to create depth in <a href="https://www.youtube.com/watch?v=MYEmL0d0lZE">The Old Mill</a></li> <li>The first Disney animations with humans (not flora/fauna) like <a href="https://www.youtube.com/watch?v=SRB2YlQOSBI">The Cookie Carnival</a></li> </ul> <p>Nobody took animation seriously, didn't think there was much possibility for it in film. Disney kept pushing the envelope. Some examples include:</p> <ul> <li>Not including the hand inside the drawing (<a href="https://www.youtube.com/watch?v=ERokauUI6TA">though early Disney ones did</a>)</li> <li>Eventually focusing on actual stories, not just gags/jokes</li> <li>Sound (the famous Mickey <a href="https://www.youtube.com/watch?v=BBgghnQF6E4">Steamboat Willie animation</a>, <a href="https://www.loc.gov/static/programs/national-film-preservation-board/documents/steamboat_willie.pdf">read more</a>)</li> <li>Merchandise, not just the art</li> <li><a href="https://www.rarenewspapers.com/view/557744">Feature films (i.e. Snow White in 1937, the first animated feature film), not just shorts</a></li> <li>Brought Hollywood to Television<ul> <li>"Walt Disney signed an exclusive long-term contract today with the American Broadcasting Company to become the first leading Hollywood producer to enter into formal alliance with television. <a href="https://timesmachine.nytimes.com/timesmachine/1954/04/03/84611681.html?pageNumber=19">NY Times</a></li> </ul> </li> <li>Added <a href="https://d23.com/the-wonderful-things-about-walt-disneys-wonderful-world-of-color/">color to TV shows</a></li> </ul> <h3 id="snow-white">Snow White</h3><p>Disney hired fine arts teachers to come and teach employees. From time to time he forced the artists to take night classes.</p> <p>They trained for years(?) before <em>starting</em> the animation of Snow White and did almost all the animation in the last 10 months or so before the release in December 1937.</p> <p>They had to do 24-hour animation in 8 hour shifts to get up to speed. They had to hire 100s of animators to do fill in work so the “master” animators could focus on “drawing the extremes”.</p> <p>The average age at Disney was 25. These days of the 1930s really felt quite similar to what a Silicon Valley startup is thought to be.</p> <p>Disney preferred to hire recent art school students so they could train them in the Disney style.</p> <p>They could not animate humans during Snow White well enough so they ended up just tracing them, called <a href="https://imgur.com/gallery/IZkSR">rotoscoping</a>.</p> <p>The Snow White voice cast were quite famous at the time. We wouldn't know it now but it was basically an ensemble cast.</p> <h3 id="world-war-2">World War 2</h3><p>Ran low on money so they produced films for the <a href="https://en.wikipedia.org/wiki/List_of_Walt_Disney%27s_World_War_II_productions_for_Armed_Forces">US Government</a>. <a href="https://www.smithsonianmag.com/history/how-disney-propaganda-shaped-life-on-the-home-front-during-wwii-180979057/">Propaganda</a>, basically. But also <a href="https://www.youtube.com/watch?v=kRVFQs2XYy4">instructional videos</a>.</p> <p><a href="https://animationguild.org/about-the-guild/disney-strike-1941/">Disney workers began striking (1941)</a> and established unions. If Disney was a dick before this, he became a much bigger dick after this.</p> <h3 id="post-war">Post War</h3><p>Got into television with ABC initially. First Hollywood company to do so. Arrangement with ABC was in part to finance Disneyland. (Not covered in the book but Disney <a href="https://www.nytimes.com/1995/08/01/business/media-business-merger-walt-disney-acquire-abc-19-billion-deal-build-giant-for.html">eventually took over ABC</a>, not before eventually splitting ABC and working with NBC though.)</p> <p>Disney stopped caring about films and moved to mostly thinking about Disneyland, this under WED (what is now Walt Disney Imagineering).</p> <p>After Disneyland launched he moved on to world fairs and eventually Disneyworld. He died of lung cancer before completing Disneyworld.</p> <h3 id="tidbits">Tidbits</h3><ul> <li><a href="https://www.disneyplus.com/video/aa400cf1-a54d-4187-997d-573711c88697">The Reluctant Dragon</a>, a throwaway film because they needed money when they went public. It is the story of a children's book author trying to get Disney to make a film out of his book. He stumbles around the new Disney Burbank Studio through art classes and musicians practicing, uncovering how Disney films are made in the process.</li> </ul> <h3 id="questions">Questions</h3><ul> <li>What were the other major animation studies? Even if Snow White was the first animated feature film, surely others must have rushed to copy the success. Who were they?<ul> <li>UPA (Mr Magoo) was one. Also Warner Brothers</li> </ul> </li> </ul> <h3 id="conclusion">Conclusion</h3><p>Basically after every turn he'd get tired of the stuff he had already done (and killed at doing) to do something new. From animated shorts to feature films to television to Disneyland to Disneyworld and EPCOT.</p> <p>To his employees he was a huge dick. They'd be in constant fear of upsetting him and getting fired. And he admitted that he would basically fire people randomly. He'd fire anyone important enough to get their name on a door (i.e. establish their own fiefdom within the company). But it seems more like Disney the company worked in spite of this rather than because of this.</p> <p><strong>After Mary Poppins (1964, two years before he died): "I'm on the spot. I have to keep trying to keep up to that same level. And the way to do it is not to worry, not to get tense. Not to think, 'I got to beat Mary Poppins', 'I got to beat Mary Poppins'. The way to do it is just to go off and get interested in some little thing, some little idea that interests me. Some little idea that looks like fun."</strong></p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Finished Neal Gabler&#39;s Walt Disney (5/5) and here are my raw notes. (If I had to polish the notes I wouldn&#39;t have the will to publish.) Hopefully a few interesting bits and links in there though.<br><br>In particularly this quote (2nd pic) really struck me.<a href="https://t.co/P9astFZ6Ts">https://t.co/P9astFZ6Ts</a> <a href="https://t.co/wKPd6zjLau">pic.twitter.com/wKPd6zjLau</a></p>&mdash; Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1627025676162281472?ref_src=twsrc%5Etfw">February 18, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/2023-02-18-neal-gabler-walt-disney-notes.htmlSat, 18 Feb 2023 00:00:00 +0000Lessons learned streaming building a Scheme-like interpreter in Gohttp://notes.eatonphil.com/2023-01-30-livescheme.html<p>I wanted to practice making coding videos so I did a <a href="https://www.youtube.com/watch?v=lZNhZI-dN9k&amp;list=PLjJMyANAIVHEgUOK2cU0hrvSwFPNHT2a7">four-part series</a> on writing a basic Scheme-like language (minus macros and arrays and tons of stuff).</p> <p>I picked this simple topic because I wanted a low-stakes way to learn what I did not know about making videos.</p> <p>Here was the end result (nothing crazy):</p> <div class="highlight"><pre><span></span><span class="nv">$</span><span class="w"> </span><span class="nv">go</span><span class="w"> </span><span class="nv">build</span> <span class="nv">$</span><span class="w"> </span><span class="nv">cat</span><span class="w"> </span><span class="nv">examples/fib</span><span class="o">.</span><span class="nv">scm</span> <span class="p">(</span><span class="nf">func</span><span class="w"> </span><span class="nv">fib</span><span class="w"> </span><span class="p">(</span><span class="nf">a</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nb">&lt;</span><span class="w"> </span><span class="nv">a</span><span class="w"> </span><span class="mi">2</span><span class="p">)</span> <span class="w"> </span><span class="nv">a</span> <span class="w"> </span><span class="p">(</span><span class="nb">+</span><span class="w"> </span><span class="p">(</span><span class="nf">fib</span><span class="w"> </span><span class="p">(</span><span class="nb">-</span><span class="w"> </span><span class="nv">a</span><span class="w"> </span><span class="mi">1</span><span class="p">))</span><span class="w"> </span><span class="p">(</span><span class="nf">fib</span><span class="w"> </span><span class="p">(</span><span class="nb">-</span><span class="w"> </span><span class="nv">a</span><span class="w"> </span><span class="mi">2</span><span class="p">)))))</span> <span class="p">(</span><span class="nf">fib</span><span class="w"> </span><span class="mi">11</span><span class="p">)</span> <span class="nv">$</span><span class="w"> </span><span class="o">.</span><span class="nv">/livescheme</span><span class="w"> </span><span class="nv">examples/fib</span><span class="o">.</span><span class="nv">scm</span> <span class="mi">89</span> </pre></div> <p>The code for the project is <a href="https://github.com/eatonphil/livescheme">here</a>.</p> <h3 id="video-archives">Video archives</h3><p>Here are the four episodes! Each about an hour long. One per week for four weeks.</p> <ul> <li><a href="https://www.youtube.com/watch?v=lZNhZI-dN9k">Part 1: A lexer</a></li> <li><a href="https://www.youtube.com/watch?v=5ttFEPQopXc">Part 2: Parsing</a></li> <li><a href="https://www.youtube.com/watch?v=YwmGcverSHI">Part 3: AST walking interpreter</a></li> <li><a href="https://www.youtube.com/watch?v=skDhTWILH8I">Part 4: Cleanup and Fibonacci</a></li> </ul> <h3 id="live-live">Live live</h3><p>The videos were <a href="https://twitch.tv/eatonphil">streamed to Twitch</a> live.</p> <p>I didn't prep for them because I wanted to show warts and all. The thought process.</p> <p>But some things turned out to be tricky to explain without preparation (function calling conventions, mostly).</p> <p>Overall hopefully the series was somewhat useful.</p> <h3 id="full-screen-windows">Full screen windows</h3><p>The first episode I did I didn't make sure that the terminal window was captured full screen. So some of my code went off the bottom of the video. That was dumb.</p> <p>I even have a tmux mode-line at the bottom of the terminal app that I could have looked for to notice it didn't exist in the OBS view.</p> <p>So I made sure to have the full window in view after the first episode.</p> <h3 id="twitch-moderation">Twitch moderation</h3><p><a href="https://safety.twitch.tv/s/article/Protect-your-channel-with-Shield-Mode">Twitch Shield Mode</a> is great. But the default setting prevents folks from commenting live until they've followed you for 2 weeks or something.</p> <p>For someone starting a channel that doesn't make much sense. So in my first video I disabled it so folks could chat. And then some crypto scammer came in. Go figure.</p> <p>After the first video I turned Shield Mode back on but set the minimum follow time to 10 minutes I think.</p> <h3 id="obs-studio">OBS Studio</h3><p>I used <a href="https://obsproject.com/">OBS Studio</a> to record. I was frustrated with it for a while because the video would lag so much when I tested out streaming. After playing around with Twitch Studio and giving up on it for being too simple, I messed with OBS video settings enough to get my video to not lag. Unfortunately I can't remember what settings I used.</p> <h3 id="noise-gate-/-pop-filter">Noise Gate / pop filter</h3><p>The <a href="https://obsproject.com/kb/noise-gate-filter">Noise Gate Filter</a> is awesome. My mechanical keyboard sounded obnoxious before I turned it on. I was considering getting a pop filter but then discovered that the Noise Gate Filter is built in, you just have to turn it on.</p> <h3 id="scenes">Scenes</h3><p>It also took me a while to understand OBS Scenes but then I realized I can use them to have an intro graphic (without the mic on!), a main coding scene (focused on my terminal and with my webcam overlayed), and a "back soon" graphic if I needed it.</p> <p>To get the mic off you have to <a href="https://obsproject.com/forum/threads/mute-one-specific-scene.43661/">disable the mic globally</a> (it's on globally by default) and then add it as an input only to the scenes you want.</p> <h3 id="storage-and-export-to-youtube">Storage and export to YouTube</h3><p>Twitch doesn't store streams by default. You have to turn on <a href="https://help.twitch.tv/s/article/video-on-demand?language=en_US">Video on Demand</a>.</p> <p>Even when it's turned on the videos only seem to be stored for 1 week. Maybe that's configurable but I didn't see it.</p> <p>In any case it's not a problem because you can set up a YouTube connection. Then after a stream is complete you find the stream video and click Export. It takes about a minute to upload the hour long videos I did. Though YouTube post-processing took a while longer after that.</p> <h3 id="next?">Next?</h3><p>I'm forced to take a break from recording these videos for the next two weeks since I'll be <a href="https://systemsdistributed.com/">in Cape Town</a>.</p> <p>I haven't decided yet if I'll continue this series (not something I'm extremely excited about since everyone builds a Scheme-like language).</p> <p>I'd like to have a project that I can keep contributing to over time but I don't see very much value in doing that based on a Scheme or any lisp-like.</p> <p>Maybe I'll do a basic JavaScript implementation next. Or another basic SQL database. Dunno.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Now that I&#39;m done that series on the Scheme-like interpreter in Go (at least for a few weeks), I wrote down a few thoughts about the experience and the Twitch and OBS Studio setup.<br><br>Up next after Cape Town? Not totally sure yet!<a href="https://t.co/bgdO1ZI5Ow">https://t.co/bgdO1ZI5Ow</a> <a href="https://t.co/E1kwMRcCWY">pic.twitter.com/E1kwMRcCWY</a></p>&mdash; Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1620239367037157376?ref_src=twsrc%5Etfw">January 31, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/2023-01-30-livescheme.htmlMon, 30 Jan 2023 00:00:00 +0000An effective product managerhttp://notes.eatonphil.com/effective-product-manager.html<p>There are three specific activities I have loved in some product managers I've worked with (and missed in others).</p> <p>tldr;</p> <ul> <li>Talk with customers and prospects</li> <li>Develop and share a vision</li> <li>Evangelize</li> </ul> <h3 id="talk-with-customers-and-prospects">Talk with customers and prospects</h3><p>As a product manager, your superpower over engineering is to have spent time with customers and prospects. You should have (or develop) a good understanding of the market and your product's potential.</p> <p>The only way you can do this is by spending time, over time, with customers and prospects. Understanding their workflows and their issues.</p> <h3 id="develop-and-share-a-vision">Develop and share a vision</h3><p>Cynical folks will cringe at the word "vision" but it is a serious and necessary part of a successful organization.</p> <p>As a product manager, you should establish and share a path for engineering to follow based on your understanding of customers, prospects, the market, and the company.</p> <p>This is the "roadmap" and "prioritization". But prioritization is useless without a long-term vision.</p> <p>The roadmap should represent (and broadly demonstrate) a concrete and meaningful goal. A goal that you can and should adjust over time as the company and market changes.</p> <h3 id="evangelize">Evangelize</h3><p>In bigger organizations there might be dedicated evangelism teams. But product managers must drive this work.</p> <p>Evangelism should fit the vision you've developed.</p> <p>And in the absense of dedicated evangelism teams, product managers should be creating demos, writing blog posts, and testing the solution with customers and prospects.</p> <p>Again, it's fine for dedicated teams outside of product management to do bits of that work. But it must be driven and led by the product manager.</p> <h3 id="it's-hard">It's hard</h3><p>Observed as I have from outside, being an effective product manager feels like a massively challenging task.</p> <p>It's so easy to go without talking to customers, to get sucked into day-to-day issues and not create a vision, and to allow evangelism to happen ad-hoc.</p> <p>Then there's the fact you don't live in a vacuum. You may have a boss in product management. Your engineering peers may have competing priorities. You may have a hard time understanding the founders or CEO. In a large company, you may not even have a CEO.</p> <h3 id="my-ideas,-your-ideas">My ideas, your ideas</h3><p>These are my ideas based on my <a href="https://eatonphil.com/">experience</a>. You may have your own ideas. If mine help you, great! If they don't, great!</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I&#39;ve been considering recently what makes an effective product manager. So I wrote down a few of my thoughts.<br><br>What I&#39;ve loved the most in some PMs and missed the most in others.<br><br>I&#39;d likewise love to hear what you think!<a href="https://t.co/5vTWTNhs68">https://t.co/5vTWTNhs68</a> <a href="https://t.co/vXjPY9fiVT">pic.twitter.com/vXjPY9fiVT</a></p>&mdash; Phil Eaton (@eatonphil) <a href="https://twitter.com/eatonphil/status/1617661616593723394?ref_src=twsrc%5Etfw">January 23, 2023</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/effective-product-manager.htmlMon, 23 Jan 2023 00:00:00 +0000The year in books: 2022http://notes.eatonphil.com/2023-01-12-year-in-books.html<p>In 2022 I <a href="https://www.goodreads.com/challenges/11636-2022-reading-challenge">finished 20 books</a> spanning 15,801 pages. 3 more than I read in 2021, but about twice the number of pages. 3 fiction and 17 non-fiction. Another ~30 started but not finished.</p> <p>I had a hard time reading books while I was trying to start my own company. But I also discovered audiobooks. I would put on a book and listen while I did my chores. Only 5 of the 20 books I finished were physical (or kindle) books. The other 15 were audiobooks.</p> <h3 id="non-fiction:-13-to-recommend">Non-fiction: 13 to recommend</h3><p>After I started read Robert Caro's Master of the Senate I got hooked on history and felt less daunted about larger books.</p> <p>The only non-fiction I read in 2022 was US and UK history.</p> <p>Here were my favorites:</p> <ul> <li><a href="https://www.goodreads.com/book/show/86525.Master_of_the_Senate">Master of the Senate</a> by Robert Caro: Covering more than just Lyndon B. Johnson but the history of the Senate and the Civil Rights movements in the US. This book is now on my <a href="https://lists.eatonphil.com/book-recommendations.html">list of best books</a>.</li> <li><a href="https://www.goodreads.com/book/show/19809.The_Last_Lion">The Last Lion: Winston Spencer Churchill: Visions of Glory, 1874-1932</a> by William Manchester: First in a three-volume series about Churchill. He's an especially interesting guy to read about because he served in UK politics 1901 to his retirement (for the second time) as UK Prime Minister in 1955. He was First Lord of the Admiralty in World War 1 before he, more famously, become Prime Minister during World War 2. This entire series is on my <a href="https://lists.eatonphil.com/book-recommendations.html">list of best books</a>.</li> <li><a href="https://www.goodreads.com/book/show/42547.The_Autobiography_of_Martin_Luther_King_Jr_">The Autobiography of Martin Luther King, Jr.</a>: Sad and revealing. Though it doesn't talk much about his legacy since it only includes his writings.</li> <li><a href="https://www.goodreads.com/book/show/13049569-the-passage-of-power">Passage of Power</a> by Robert Caro: Covering LBJ's pathetic failed attempts at the presidency before becoming JFK's Vice President, up to JFK's assassination. Still a very good book. I can't wait for Caro's final book to come out.</li> <li><a href="https://www.goodreads.com/book/show/2279.Truman">Truman</a> by David McCullough: I always thought Truman was a lame nerd but he actually had a very interesting life (and as I'd later discover, is far from the lamest president. Wilson hands down takes that place.) And unlike most other famous politicians I read about, he had a great relationship with his wife. He was honest and respectable and was the first US president to normalize relations with Mexico since the Mexican-American War (that U.S. Grant and Robert E. Lee fought in the 1840s).</li> <li><a href="https://www.goodreads.com/book/show/55751.The_Last_Lion">The Last Lion: Winston Spencer Churchill: Alone, 1932-40</a> by William Manchester: The second book in the series. Pretty depressing because it's a decade of Churchill noticing Nazi German behavior and stressing UK preparedness and the UK ignoring him and Nazi Germany.</li> <li><a href="https://www.goodreads.com/book/show/746673.The_Last_Lion">The Last Lion: Winston Spencer Churchill: Defender of the Realm, 1940-1965</a> by William Manchester: The final book in the series, covering his Prime Ministership.</li> <li><a href="https://www.goodreads.com/book/show/884536.Eleanor_Roosevelt_Volume_1">Eleanor Roosevelt, Volume 1: The Early Years, 1884-1933</a> by Blanche Wiesen Cook: Her background and many problems, as the daughter of Theodore Roosevelt's brother and later husband of their distant cousin, is pretty hard to relate to. Still it was quite interesting to hear about her life and early activities how she became such an outspoken progressive activist from being quite conservative.</li> <li><a href="https://www.goodreads.com/book/show/17082810-abraham-lincoln">Abraham Lincoln: A Life, Volume One</a> by Michael Burlingame</li> <li><a href="https://www.goodreads.com/book/show/17082819-abraham-lincoln">Abraham Lincoln: A Life, Volume Two</a> by Michael Burlingame</li> <li><a href="https://www.goodreads.com/book/show/34237826-grant">Grant</a> by Ron Chernow: Among famous generals of the Civil War, somehow Robert E. Lee and Stonewall Jackson came to mind to me more readily than Grant. I'm glad I read this book because the popularity of Southern generals today seems like revisionism. This book makes strong arguments that while Lee was a great officer, he could only think in terms of short-term tactics and the Virginia region. Whereas Grant was the first (US, anyway) officer to consider and command (via telegraph) all theaters of war at once, every day. And this book redeems his presidency somewhat. His progressive adoption of freed Black people and work to make them equal citizens is highly commendable. Even with the horror of what happened in the South after the war ended.</li> <li><a href="https://www.goodreads.com/book/show/40929.The_Rise_of_Theodore_Roosevelt">The Rise of Theodore Roosevelt</a> by Edmund Morris: First in a three-volume series about the 26th President. I read somewhere that it can feel impossible to read a bad biography of Roosevelt because he was such an interesting human. That may be true. This book didn't disappoint. Roosevelt growing up in a townhouse in Manhattan, going to Harvard, buying a farm on Long Island is all hard to relate to. His Puritanical morals and machismo were also difficult to get past. But he was a very interesting guy.</li> <li><a href="https://www.goodreads.com/book/show/40923.Theodore_Rex">Theodore Rex</a> by Edmund Morris: Second in the series, covering the entirety of Roosevelt's presidency. Like the first volume, a great read. I always used to think Roosevelt was a pure war-monger. But he helped avert war with the UK and Germany over Venezuelan debt-default. And he later received the Nobel Peace Prize for mediating peace between Japan and Russia in 1905.</li> </ul> <h3 id="fiction:-1-to-recommend">Fiction: 1 to recommend</h3><p>Of the three I read last year, I really enjoyed one:</p> <ul> <li><a href="https://www.goodreads.com/book/show/18950097-the-leopard">The Leopard</a> by Giuseppe Tomasi di Lampedusa: A gentle piece of historical fiction set during the 1860s in Sicily during and after the unification of Italy. I learned about this book from a Rick Stein episode in the Mediterranean Escapes series.</li> </ul> http://notes.eatonphil.com/2023-01-12-year-in-books.htmlThu, 12 Jan 2023 00:00:00 +0000Favorite compiler and interpreter resourceshttp://notes.eatonphil.com/2023-01-04-compiler-resources.html<head> <meta http-equiv="refresh" content="4;URL='https://lists.eatonphil.com/compilers-and-interpreters.html'" /> </head><p>This is an external post of mine. Click <a href="https://lists.eatonphil.com/compilers-and-interpreters.html">here</a> if you are not redirected.</p> http://notes.eatonphil.com/2023-01-04-compiler-resources.htmlThu, 05 Jan 2023 00:00:00 +0000General book recommendationshttp://notes.eatonphil.com/2023-01-04-book-recommendations.html<head> <meta http-equiv="refresh" content="4;URL='https://lists.eatonphil.com/book-recommendations.html'" /> </head><p>This is an external post of mine. Click <a href="https://lists.eatonphil.com/book-recommendations.html">here</a> if you are not redirected.</p> http://notes.eatonphil.com/2023-01-04-book-recommendations.htmlWed, 04 Jan 2023 00:00:00 +0000In response to a frontend developer asking about database developmenthttp://notes.eatonphil.com/2023-01-01-letter-to-a-frontend-developer-asking-about-database-development.html<head> <meta http-equiv="refresh" content="4;URL='https://letters.eatonphil.com/2023-01-01-letter-to-a-frontend-developer-asking-about-database-development.html'" /> </head><p>This is an external post of mine. Click <a href="https://letters.eatonphil.com/2023-01-01-letter-to-a-frontend-developer-asking-about-database-development.html">here</a> if you are not redirected.</p> http://notes.eatonphil.com/2023-01-01-letter-to-a-frontend-developer-asking-about-database-development.htmlSun, 01 Jan 2023 00:00:00 +0000Is it worth writing about?http://notes.eatonphil.com/is-it-worth-writing-about.html<p>You acquire a skill or experience through time and effort, then downplay the impact of writing and sharing the learning process.</p> <p>Professionals seem naturally to imagine a high bar for what is worth writing about.</p> <p>I think that's misguided. This article is not criticism of folks with these beliefs, but rather encouragement for folks looking for a reason to write.</p> <p>There are (at least) a few concrete reasons to write about what you've learned, even when you don't think it's novel.</p> <h3 id="to-practice-writing">To practice writing</h3><p>This is the easiest reason. While practice does not imply improvement, you cannot improve without practice.</p> <p>Every time you learn something is a chance to write down both what you've learned and also how you learned it.</p> <p>For professional developers this chance happens all the time. Daily, really. But most developers, even those who want to write more, let the opportunity slip.</p> <h3 id="providing-variety">Providing variety</h3><p>When I learn a topic I normally go through dozens of posts, papers, docs, videos or books to find a version that clicks. If I can. I prefer to start with blog posts and often there are not blog posts on the subject. Books, docs, videos, and academic papers aren't often as accessible.</p> <p>Even if you're writing about a popular topic, there's still a chance your post gets through to someone in a way other posts do not.</p> <p class="note"> For programmers there are notorious topics you can avoid if you'd like ("What is a monad", "Why is lisp interesting", "Kubernetes sucks"). Or not. I've fallen into those traps. </p><p>Additionally, as you gain experience as a programmer (or product manager, or whatever), your perspective and approach becomes both more interesting and more valuable.</p> <p>I don't recall ever thinking: "I wish they'd write less". But I'm always wishing some folks wrote more, or at all.</p> <p>Some folks with experience, writing about widely varied topics in software include:</p> <ul> <li><a href="https://eli.thegreenplace.net/">Eli Bendersky</a></li> <li><a href="https://nullprogram.com/blog/2015/03/19/">Chris Wellons</a></li> <li>And <a href="https://zserge.com/">Serge Zaitsev</a></li> </ul> <p>But experience need not be a prerequisite. Experts (who don't practice explaining) easily forget how they came to their current understanding. A beginner's experience is valuable for everyone who is not a beginner, sometimes also for beginners.</p> <h3 id="to-cement-understanding">To cement understanding</h3><p>Finally, honest writing <em>forces</em> you to either understand the dark corners of what you've learned or to ask for help in these dark corners.</p> <p>I have repeatedly wrestled with topics in software only to be further forced to explain <em>why</em> (or <em>how</em>) when I write.</p> <p>And it has often forced me to restructure code or ideas in ways that are easier to explain. I think that's a pretty valuable act for the long-term.</p> <h3 id="bad-faith">Bad faith</h3><p>There's a bad faith argument that you sometimes see. Here's a variation that comes to mind.</p> <blockquote><p>The internet is already full of crap. People who aren't experts are just making it worse.</p> </blockquote> <p>I hope you ignore these comments. :) If there's a quality problem that is genuinely causing harm, that's for search engines and trade organizations to deal with.</p> <h3 id="in-the-extreme">In the extreme</h3><p><a href="https://til.simonwillison.net/">Simon Willison's TIL</a> site is the most prolific version of this I've ever seen. I don't know if I personally aspire to Simon's level, but I think it's worth seeing.</p> <h3 id="topics">Topics</h3><p>Some topics I think are always worth writing about and sharing:</p> <ul> <li>Your process, failures and successes, to figuring something out</li> <li>How to hack on some major open source project</li> <li>In-depth comparison of projects or approaches, down to source code, benchmarks, and architecture when relevant</li> <li>Building minimal versions of some production system</li> <li>How some major systems works under the hood, down to the code</li> <li>Mistakes you made in structuring organizations, or production architecture, or testing, etc.</li> <li>How to get the dang configuration right for testing Electron apps in Github Actions</li> </ul> <p>For programming posts specifically: I strongly encourage you to include or walk through working code. Have tests. And have the code build process hooked up to GitHub Actions or SourceHut CI or whatever. This helps ensure your work is still relevant over time.</p> <h3 id="when-you-write">When you write</h3><p>Write to explain and teach. When you don't understand something, call out that you don't understand it. That's not a bad thing, and the internet is normally happy to help.</p> <p>Don't shy away from showing code, showing things that broke, showing the ugly process. It's encouraging for others to see.</p> <h3 id="end-goal">End goal</h3><p>Well, ideally we have fewer clickbait "5 best React alternatives" articles and more thoughtful pieces intended to teach and educate with a bit of rigor.</p> <p>It's better for individuals and for companies. It's better for the internet.</p> <h3 id="community">Community</h3><p>If you want a community of folks where you can find encouragement to write and eyes to review drafts, check out the #writing-and-drafts channel on the <a href="https://eatonphil.com/discord.html">Software Internals Discord</a>.</p> <h3 id="is-it-worth-writing-about?">Is it worth writing about?</h3><p>Well if you come to me I'm almost surely going to say yes. Poor Betteridge.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Wrote a short post as a bit of encouragement to folks who want to write more but imagine a high bar for what&#39;s worthwhile.<br><br>tldr; if you ask me it&#39;s almost always going to be a yes. And I think there&#39;s a path toward a higher-quality internet.<a href="https://t.co/Nn6BvXhNdZ">https://t.co/Nn6BvXhNdZ</a> <a href="https://t.co/KELvsxnr2w">pic.twitter.com/KELvsxnr2w</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1598441836284203011?ref_src=twsrc%5Etfw">December 1, 2022</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/is-it-worth-writing-about.htmlThu, 01 Dec 2022 00:00:00 +0000A Programmer-Friendly I/O Abstraction Over io_uring and kqueuehttp://notes.eatonphil.com/a-friendly-abstraction-over-iouring-and-kqueue.html<head> <meta http-equiv="refresh" content="4;URL='https://tigerbeetle.com/blog/a-friendly-abstraction-over-iouring-and-kqueue/'" /> </head><p>This is an external post of mine. Click <a href="https://tigerbeetle.com/blog/a-friendly-abstraction-over-iouring-and-kqueue/">here</a> if you are not redirected.</p> http://notes.eatonphil.com/a-friendly-abstraction-over-iouring-and-kqueue.htmlWed, 23 Nov 2022 00:00:00 +0000Writing a SQL database, take two: Zig and RocksDBhttp://notes.eatonphil.com/zigrocks-sql.html<p>For my second project while learning Zig, I decided to port an old, minimal SQL database project from Go to Zig.</p> <p>In this post, in ~1700 lines of code (yes, I'm sorry it's bigger than my usual), we'll create a basic embedded SQL database in Zig on top of RocksDB. Other than the RocksDB layer it will not use third-party libraries.</p> <p>The code for this project is available on <a href="https://github.com/eatonphil/zigrocks">GitHub</a>.</p> <p>Here are a few example interactions we'll support:</p> <div class="highlight"><pre><span></span><span class="err">$</span><span class="w"> </span><span class="p">.</span><span class="o">/</span><span class="n">main</span><span class="w"> </span><span class="c1">--database data --script &lt;(echo &quot;CREATE TABLE y (year int, age int, name text)&quot;)</span> <span class="n">echo</span><span class="w"> </span><span class="ss">&quot;CREATE TABLE y (year int, age int, name text)&quot;</span> <span class="n">ok</span> <span class="err">$</span><span class="w"> </span><span class="p">.</span><span class="o">/</span><span class="n">main</span><span class="w"> </span><span class="c1">--database data --script &lt;(echo &quot;INSERT INTO y VALUES (2010, 38, &#39;Gary&#39;)&quot;)</span> <span class="n">echo</span><span class="w"> </span><span class="ss">&quot;INSERT INTO y VALUES (2010, 38, &#39;Gary&#39;)&quot;</span> <span class="n">ok</span> <span class="err">$</span><span class="w"> </span><span class="p">.</span><span class="o">/</span><span class="n">main</span><span class="w"> </span><span class="c1">--database data --script &lt;(echo &quot;INSERT INTO y VALUES (2021, 92, &#39;Teej&#39;)&quot;)</span> <span class="n">echo</span><span class="w"> </span><span class="ss">&quot;INSERT INTO y VALUES (2021, 92, &#39;Teej&#39;)&quot;</span> <span class="n">ok</span> <span class="err">$</span><span class="w"> </span><span class="p">.</span><span class="o">/</span><span class="n">main</span><span class="w"> </span><span class="c1">--database data --script &lt;(echo &quot;INSERT INTO y VALUES (1994, 18, &#39;Mel&#39;)&quot;)</span> <span class="n">echo</span><span class="w"> </span><span class="ss">&quot;INSERT INTO y VALUES (1994, 18, &#39;Mel&#39;)&quot;</span> <span class="n">ok</span> <span class="o">#</span><span class="w"> </span><span class="n">Basic</span><span class="w"> </span><span class="n">query</span> <span class="err">$</span><span class="w"> </span><span class="p">.</span><span class="o">/</span><span class="n">main</span><span class="w"> </span><span class="c1">--database data --script &lt;(echo &quot;SELECT name, age, year FROM y&quot;)</span> <span class="n">echo</span><span class="w"> </span><span class="ss">&quot;SELECT name, age, year FROM y&quot;</span> <span class="o">|</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">|</span><span class="n">age</span><span class="w"> </span><span class="o">|</span><span class="k">year</span><span class="w"> </span><span class="o">|</span> <span class="o">+</span><span class="w"> </span><span class="o">====</span><span class="w"> </span><span class="o">+===</span><span class="w"> </span><span class="o">+====</span><span class="w"> </span><span class="o">+</span> <span class="o">|</span><span class="w"> </span><span class="n">Mel</span><span class="w"> </span><span class="o">|</span><span class="mi">18</span><span class="w"> </span><span class="o">|</span><span class="mi">1994</span><span class="w"> </span><span class="o">|</span> <span class="o">|</span><span class="w"> </span><span class="n">Gary</span><span class="w"> </span><span class="o">|</span><span class="mi">38</span><span class="w"> </span><span class="o">|</span><span class="mi">2010</span><span class="w"> </span><span class="o">|</span> <span class="o">|</span><span class="w"> </span><span class="n">Teej</span><span class="w"> </span><span class="o">|</span><span class="mi">92</span><span class="w"> </span><span class="o">|</span><span class="mi">2021</span><span class="w"> </span><span class="o">|</span> <span class="o">#</span><span class="w"> </span><span class="k">With</span><span class="w"> </span><span class="k">WHERE</span> <span class="err">$</span><span class="w"> </span><span class="p">.</span><span class="o">/</span><span class="n">main</span><span class="w"> </span><span class="c1">--database data --script &lt;(echo &quot;SELECT name, year, age FROM y WHERE age &lt; 40&quot;)</span> <span class="n">echo</span><span class="w"> </span><span class="ss">&quot;SELECT name, year, age FROM y WHERE age &lt; 40&quot;</span> <span class="o">|</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">|</span><span class="k">year</span><span class="w"> </span><span class="o">|</span><span class="n">age</span><span class="w"> </span><span class="o">|</span> <span class="o">+</span><span class="w"> </span><span class="o">====</span><span class="w"> </span><span class="o">+====</span><span class="w"> </span><span class="o">+===</span><span class="w"> </span><span class="o">+</span> <span class="o">|</span><span class="w"> </span><span class="n">Mel</span><span class="w"> </span><span class="o">|</span><span class="mi">1994</span><span class="w"> </span><span class="o">|</span><span class="mi">18</span><span class="w"> </span><span class="o">|</span> <span class="o">|</span><span class="w"> </span><span class="n">Gary</span><span class="w"> </span><span class="o">|</span><span class="mi">2010</span><span class="w"> </span><span class="o">|</span><span class="mi">38</span><span class="w"> </span><span class="o">|</span> <span class="o">#</span><span class="w"> </span><span class="k">With</span><span class="w"> </span><span class="n">operations</span> <span class="err">$</span><span class="w"> </span><span class="p">.</span><span class="o">/</span><span class="n">main</span><span class="w"> </span><span class="c1">--database data --script &lt;(echo &quot;SELECT &#39;Name: &#39; || name, year + 30, age FROM y WHERE age &lt; 40&quot;)</span> <span class="n">echo</span><span class="w"> </span><span class="ss">&quot;SELECT &#39;Name: &#39; || name, year + 30, age FROM y WHERE age &lt; 40&quot;</span> <span class="o">|</span><span class="w"> </span><span class="k">unknown</span><span class="w"> </span><span class="o">|</span><span class="k">unknown</span><span class="w"> </span><span class="o">|</span><span class="n">age</span><span class="w"> </span><span class="o">|</span> <span class="o">+</span><span class="w"> </span><span class="o">=======</span><span class="w"> </span><span class="o">+=======</span><span class="w"> </span><span class="o">+===</span><span class="w"> </span><span class="o">+</span> <span class="o">|</span><span class="w"> </span><span class="n">Name</span><span class="p">:</span><span class="w"> </span><span class="n">Mel</span><span class="w"> </span><span class="o">|</span><span class="mi">2024</span><span class="w"> </span><span class="o">|</span><span class="mi">18</span><span class="w"> </span><span class="o">|</span> <span class="o">|</span><span class="w"> </span><span class="n">Name</span><span class="p">:</span><span class="w"> </span><span class="n">Gary</span><span class="w"> </span><span class="o">|</span><span class="mi">2040</span><span class="w"> </span><span class="o">|</span><span class="mi">38</span><span class="w"> </span><span class="o">|</span> </pre></div> <p>This post is standalone (except for the RocksDB layer which I <a href="https://notes.eatonphil.com/zigrocks.html">wrote about here</a>) but it builds on a number of ideas I've explored that you may be interested in:</p> <ul> <li><a href="https://notes.eatonphil.com/whats-the-big-deal-about-key-value-databases.html">What's the big deal about key-value databases like FoundationDB and RocksDB?</a></li> <li><a href="https://notes.eatonphil.com/distributed-postgres.html">Let's build a distributed Postgres proof of concept</a></li> <li><a href="https://notes.eatonphil.com/documentdb.html">Writing a document database from scratch in Go</a></li> <li>And the grandfather series, <a href="https://notes.eatonphil.com/database-basics.html">Writing a SQL database from scratch in Go</a></li> </ul> <p>This project is mostly a port of my <a href="https://notes.eatonphil.com/database-basics.html">SQL database from scratch in Go</a> project, but unlike that series this project will have persistent storage via RocksDB.</p> <p>And unlike that post, this project is written in Zig!</p> <p>Let's get started. :)</p> <h3 id="components">Components</h3><p>We're going to split up the project into the following major components:</p> <ul> <li>Lexing</li> <li>Parsing</li> <li>Storage<ul> <li>RocksDB</li> </ul> </li> <li>Execution</li> <li>Entrypoint (<code>main</code>)</li> </ul> <p><em>Lexing</em> takes a query and breaks it into an array of tokens.</p> <p><em>Parsing</em> takes the lexed array of tokens and pattern matches into a syntax tree (AST).</p> <p><em>Storage</em> maps high-level SQL entities like tables and rows into bytes that can be easily stored on disk. And it handles recovering high-level tables and rows from bytes on disk.</p> <p>Invisible to users of the <em>Storage</em> component is <em>RocksDB</em>, which is how the bytes are actually stored on disk. <a href="http://rocksdb.org/">RocksDB</a> is a persistent store that maps arbitary byte keys to arbitrary byte values. We'll use it for storing and recovering both table metadata and actual row data.</p> <p><em>Execution</em> takes a query AST and executes it against <em>Storage</em>, potentially returning result rows.</p> <p>These terms are a vast simplification of real-world database design. But they are helpful structure to have even in a project this small.</p> <h3 id="memory-management">Memory Management</h3><p>Zig doesn't have a garbage collector. Mitchell Hashimoto <a href="https://github.com/mitchellh/zig-libgc">wrote bindings to Boehm GC</a>. But Zig also has a <a href="https://ziglang.org/documentation/master/#toc-Choosing-an-Allocator">builtin Arena allocator</a> which is perfect for this simple project.</p> <p>The <code>main</code> function will create the arena and pass it to each component, where they can do allocations as they please. At the end of <code>main</code>, the entire arena will be freed at once.</p> <p>The only other place where we must do manual memory management is in the RocksDB wrapper. But <a href="https://notes.eatonphil.com/zigrocks.html">I've already covered</a> that in a separate post.</p> <h3 id="zig-specifics">Zig Specifics</h3><p>I'm not going to cover the basics of Zig syntax. If you are new to Zig, read <a href="https://notes.eatonphil.com/zigrocks.html">this</a> first! (It's short.)</p> <p>Now that we've got the basic idea, we can start coding!</p> <h3 id="types-(<code>types.zig</code>,-10-loc)">Types (<code>types.zig</code>, 10 LoC)</h3><p>Let's create a few helper types that we'll use in the rest of the code.</p> <div class="highlight"><pre><span></span><span class="kr">pub</span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">;</span> <span class="kr">pub</span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">Error</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">String</span><span class="p">;</span> <span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">Result</span><span class="p">(</span><span class="kr">comptime</span><span class="w"> </span><span class="n">T</span><span class="o">:</span><span class="w"> </span><span class="kt">type</span><span class="p">)</span><span class="w"> </span><span class="kt">type</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">union</span><span class="p">(</span><span class="k">enum</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">val</span><span class="o">:</span><span class="w"> </span><span class="n">T</span><span class="p">,</span> <span class="w"> </span><span class="n">err</span><span class="o">:</span><span class="w"> </span><span class="n">Error</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> <span class="p">}</span> </pre></div> <p>That's it. :) Makes things a little more readable.</p> <h3 id="lexing-(<code>lex.zig</code>,-308-loc)">Lexing (<code>lex.zig</code>, 308 LoC)</h3><p>Lexing turns a query string into an array of tokens.</p> <p>There are a few <em>kinds</em> of tokens we'll define:</p> <ul> <li>Keywords (like <code>CREATE</code>, <code>true</code>, <code>false</code>, <code>null</code>)<ul> <li>Syntax (commas, parentheses, operators, and all other builtin symbols)</li> </ul> </li> <li>Strings</li> <li>Integers</li> <li>Identifiers</li> </ul> <p>And not listed there but important to <em>skip past</em> is whitespace.</p> <p>Let's turn this into a Zig struct!</p> <div class="highlight"><pre><span></span><span class="kr">const</span><span class="w"> </span><span class="n">std</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">&quot;std&quot;</span><span class="p">);</span> <span class="kr">const</span><span class="w"> </span><span class="n">Error</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">&quot;types.zig&quot;</span><span class="p">).</span><span class="n">Error</span><span class="p">;</span> <span class="kr">const</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">&quot;types.zig&quot;</span><span class="p">).</span><span class="n">String</span><span class="p">;</span> <span class="kr">pub</span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">Token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">start</span><span class="o">:</span><span class="w"> </span><span class="kt">u64</span><span class="p">,</span> <span class="w"> </span><span class="n">end</span><span class="o">:</span><span class="w"> </span><span class="kt">u64</span><span class="p">,</span> <span class="w"> </span><span class="n">kind</span><span class="o">:</span><span class="w"> </span><span class="n">Kind</span><span class="p">,</span> <span class="w"> </span><span class="n">source</span><span class="o">:</span><span class="w"> </span><span class="n">String</span><span class="p">,</span> <span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">Kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">enum</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Keywords</span> <span class="w"> </span><span class="n">select_keyword</span><span class="p">,</span> <span class="w"> </span><span class="n">create_table_keyword</span><span class="p">,</span> <span class="w"> </span><span class="n">insert_keyword</span><span class="p">,</span> <span class="w"> </span><span class="n">values_keyword</span><span class="p">,</span> <span class="w"> </span><span class="n">from_keyword</span><span class="p">,</span> <span class="w"> </span><span class="n">where_keyword</span><span class="p">,</span> <span class="w"> </span><span class="c1">// Operators</span> <span class="w"> </span><span class="n">plus_operator</span><span class="p">,</span> <span class="w"> </span><span class="n">equal_operator</span><span class="p">,</span> <span class="w"> </span><span class="n">lt_operator</span><span class="p">,</span> <span class="w"> </span><span class="n">concat_operator</span><span class="p">,</span> <span class="w"> </span><span class="c1">// Other syntax</span> <span class="w"> </span><span class="n">left_paren_syntax</span><span class="p">,</span> <span class="w"> </span><span class="n">right_paren_syntax</span><span class="p">,</span> <span class="w"> </span><span class="n">comma_syntax</span><span class="p">,</span> <span class="w"> </span><span class="c1">// Literals</span> <span class="w"> </span><span class="n">identifier</span><span class="p">,</span> <span class="w"> </span><span class="n">integer</span><span class="p">,</span> <span class="w"> </span><span class="n">string</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">string</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">Token</span><span class="p">)</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">source</span><span class="p">[</span><span class="n">self</span><span class="p">.</span><span class="n">start</span><span class="p">..</span><span class="n">self</span><span class="p">.</span><span class="n">end</span><span class="p">];</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Using an <code>enum</code> helps us with type safety. And since we're storing location in the token, we can build a nice debug function for when lexing or parsing fails.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">debug</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">Token</span><span class="p">,</span><span class="w"> </span><span class="n">msg</span><span class="o">:</span><span class="w"> </span><span class="n">String</span><span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">line</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">column</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">lineStartIndex</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">lineEndIndex</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">i</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">source</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">source</span><span class="p">;</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">source</span><span class="p">.</span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">source</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="se">&#39;\n&#39;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">line</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="n">column</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="n">lineStartIndex</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">column</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">column</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">start</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Find the end of the line</span> <span class="w"> </span><span class="n">lineEndIndex</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="p">;</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">source</span><span class="p">[</span><span class="n">lineEndIndex</span><span class="p">]</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="se">&#39;\n&#39;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">lineEndIndex</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">lineEndIndex</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span> <span class="w"> </span><span class="s">&quot;{s}</span><span class="se">\n</span><span class="s">Near line {}, column {}.</span><span class="se">\n</span><span class="s">{s}</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span> <span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="n">msg</span><span class="p">,</span><span class="w"> </span><span class="n">line</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="n">column</span><span class="p">,</span><span class="w"> </span><span class="n">source</span><span class="p">[</span><span class="n">lineStartIndex</span><span class="p">..</span><span class="n">lineEndIndex</span><span class="p">]</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">);</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">column</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot; &quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span> <span class="w"> </span><span class="n">column</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">column</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;^ Near here</span><span class="se">\n\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span> <span class="w"> </span><span class="p">}</span> <span class="p">};</span> </pre></div> <p>And similarly, let's add a debug helper for when we're dealing with an array of tokens.</p> <div class="highlight"><pre><span></span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">debug</span><span class="p">(</span><span class="n">tokens</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="n">Token</span><span class="p">,</span><span class="w"> </span><span class="n">preferredIndex</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="p">,</span><span class="w"> </span><span class="n">msg</span><span class="o">:</span><span class="w"> </span><span class="n">String</span><span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">preferredIndex</span><span class="p">;</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="n">tokens</span><span class="p">.</span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">debug</span><span class="p">(</span><span class="n">msg</span><span class="p">);</span> <span class="p">}</span> </pre></div> <h4 id="token-&lt;&gt;-string-mapping">Token &lt;&gt; String Mapping</h4><p>Before we get too far from <code>Token</code> definition, let's define a mapping from the <code>Token.kind</code> enum to strings we can see in a query.</p> <div class="highlight"><pre><span></span><span class="kr">const</span><span class="w"> </span><span class="n">Builtin</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">name</span><span class="o">:</span><span class="w"> </span><span class="n">String</span><span class="p">,</span> <span class="w"> </span><span class="n">kind</span><span class="o">:</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">,</span> <span class="p">};</span> <span class="c1">// These must be sorted by length of the name text, descending, for lexKeyword.</span> <span class="kr">var</span><span class="w"> </span><span class="n">BUILTINS</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span><span class="n">_</span><span class="p">]</span><span class="n">Builtin</span><span class="p">{</span> <span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;CREATE TABLE&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">create_table_keyword</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;INSERT INTO&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">insert_keyword</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;SELECT&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">select_keyword</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;VALUES&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">values_keyword</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;WHERE&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">where_keyword</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;FROM&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">from_keyword</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;||&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">concat_operator</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;=&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">equal_operator</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;+&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">plus_operator</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;&lt;&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">lt_operator</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;(&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">left_paren_syntax</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;)&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">right_paren_syntax</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;,&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">comma_syntax</span><span class="w"> </span><span class="p">},</span> <span class="p">};</span> </pre></div> <p>We'll use this in a few lexing functions below.</p> <h4 id="whitespace">Whitespace</h4><p>Outside of tokens, we need to be able to skip past whitespace.</p> <div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">eatWhitespace</span><span class="p">(</span><span class="n">source</span><span class="o">:</span><span class="w"> </span><span class="n">String</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="p">)</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">index</span><span class="p">;</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">source</span><span class="p">[</span><span class="n">res</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&#39; &#39;</span><span class="w"> </span><span class="k">or</span> <span class="w"> </span><span class="n">source</span><span class="p">[</span><span class="n">res</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="se">&#39;\n&#39;</span><span class="w"> </span><span class="k">or</span> <span class="w"> </span><span class="n">source</span><span class="p">[</span><span class="n">res</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="se">&#39;\t&#39;</span><span class="w"> </span><span class="k">or</span> <span class="w"> </span><span class="n">source</span><span class="p">[</span><span class="n">res</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="se">&#39;\r&#39;</span><span class="p">)</span> <span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">res</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">source</span><span class="p">.</span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">res</span><span class="p">;</span> <span class="p">}</span> </pre></div> <p>All lexing functions will look like this. They'll take the source as one argument and a cursor to the current index in the source as another.</p> <h4 id="keywords">Keywords</h4><p>Let's handle lexing keyword tokens next. Keywords are case insensitive. I don't think there's a builtin case insensitive string comparison function in Zig. So let's write that first.</p> <div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">asciiCaseInsensitiveEqual</span><span class="p">(</span><span class="n">left</span><span class="o">:</span><span class="w"> </span><span class="n">String</span><span class="p">,</span><span class="w"> </span><span class="n">right</span><span class="o">:</span><span class="w"> </span><span class="n">String</span><span class="p">)</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">min</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">left</span><span class="p">;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">right</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">left</span><span class="p">.</span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">min</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">right</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">min</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">_</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">l</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">left</span><span class="p">[</span><span class="n">i</span><span class="p">];</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">l</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="mi">97</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="n">l</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="mi">122</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">l</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">l</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">32</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">r</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">right</span><span class="p">[</span><span class="n">i</span><span class="p">];</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">r</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="mi">97</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="n">r</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="mi">122</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">r</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">r</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">32</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">l</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">r</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">false</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">true</span><span class="p">;</span> <span class="p">}</span> </pre></div> <p>Unfortunately it only supports ASCII for now.</p> <p>Now we can write a simple longest-matching-substring function. It is simple because the keyword mapping we set up above is already ordered by length descending.</p> <div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">lexKeyword</span><span class="p">(</span><span class="n">source</span><span class="o">:</span><span class="w"> </span><span class="n">String</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="p">)</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">nextPosition</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="p">,</span><span class="w"> </span><span class="n">token</span><span class="o">:</span><span class="w"> </span><span class="o">?</span><span class="n">Token</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">longestLen</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">select_keyword</span><span class="p">;</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">BUILTINS</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">builtin</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">index</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">builtin</span><span class="p">.</span><span class="n">name</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="n">source</span><span class="p">.</span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">continue</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">asciiCaseInsensitiveEqual</span><span class="p">(</span><span class="n">source</span><span class="p">[</span><span class="n">index</span><span class="w"> </span><span class="p">..</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">builtin</span><span class="p">.</span><span class="n">name</span><span class="p">.</span><span class="n">len</span><span class="p">],</span><span class="w"> </span><span class="n">builtin</span><span class="p">.</span><span class="n">name</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">longestLen</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">builtin</span><span class="p">.</span><span class="n">name</span><span class="p">.</span><span class="n">len</span><span class="p">;</span> <span class="w"> </span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">builtin</span><span class="p">.</span><span class="n">kind</span><span class="p">;</span> <span class="w"> </span><span class="c1">// First match is the longest match</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">longestLen</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">nextPosition</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">null</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span> <span class="w"> </span><span class="p">.</span><span class="n">nextPosition</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">longestLen</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Token</span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">source</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">source</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">start</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">index</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">end</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">longestLen</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">kind</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">};</span> <span class="p">}</span> </pre></div> <p>That's it!</p> <h4 id="integers">Integers</h4><p>For integers we read through the source until we stop seeing decimal digits. Obviously this is a subset of what people consider integers, but it will do for now!</p> <div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">lexInteger</span><span class="p">(</span><span class="n">source</span><span class="o">:</span><span class="w"> </span><span class="n">String</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="p">)</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">nextPosition</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="p">,</span><span class="w"> </span><span class="n">token</span><span class="o">:</span><span class="w"> </span><span class="o">?</span><span class="n">Token</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">start</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">index</span><span class="p">;</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">end</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">index</span><span class="p">;</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">index</span><span class="p">;</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">source</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="s">&#39;0&#39;</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="n">source</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="s">&#39;9&#39;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">end</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">end</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">start</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">end</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">nextPosition</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">null</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span> <span class="w"> </span><span class="p">.</span><span class="n">nextPosition</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">end</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Token</span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">source</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">source</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">start</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">start</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">end</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">end</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">integer</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">};</span> <span class="p">}</span> </pre></div> <h4 id="strings">Strings</h4><p>Strings are enclosed in single quotes.</p> <div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">lexString</span><span class="p">(</span><span class="n">source</span><span class="o">:</span><span class="w"> </span><span class="n">String</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="p">)</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">nextPosition</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="p">,</span><span class="w"> </span><span class="n">token</span><span class="o">:</span><span class="w"> </span><span class="o">?</span><span class="n">Token</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">index</span><span class="p">;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">source</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="se">&#39;\&#39;&#39;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">nextPosition</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">null</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">start</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="p">;</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">end</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="p">;</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">source</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="se">&#39;\&#39;&#39;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">end</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">end</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">source</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="se">&#39;\&#39;&#39;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">start</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">end</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">nextPosition</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">null</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span> <span class="w"> </span><span class="p">.</span><span class="n">nextPosition</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Token</span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">source</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">source</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">start</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">start</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">end</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">end</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">string</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">};</span> <span class="p">}</span> </pre></div> <h4 id="identifiers">Identifiers</h4><p>Identifiers for this project are alphanumeric characters. We could support more by optionally checking for double quote enclosed strings. But I'll leave that as an exercise for the reader.</p> <div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">lexIdentifier</span><span class="p">(</span><span class="n">source</span><span class="o">:</span><span class="w"> </span><span class="n">String</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="p">)</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">nextPosition</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="p">,</span><span class="w"> </span><span class="n">token</span><span class="o">:</span><span class="w"> </span><span class="o">?</span><span class="n">Token</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">start</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">index</span><span class="p">;</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">end</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">index</span><span class="p">;</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">index</span><span class="p">;</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">((</span><span class="n">source</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="s">&#39;a&#39;</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="n">source</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="s">&#39;z&#39;</span><span class="p">)</span><span class="w"> </span><span class="k">or</span> <span class="w"> </span><span class="p">(</span><span class="n">source</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="s">&#39;A&#39;</span><span class="w"> </span><span class="k">and</span><span class="w"> </span><span class="n">source</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="s">&#39;Z&#39;</span><span class="p">)</span><span class="w"> </span><span class="k">or</span> <span class="w"> </span><span class="p">(</span><span class="n">source</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&#39;*&#39;</span><span class="p">))</span> <span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">end</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">end</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">start</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">end</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">nextPosition</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">null</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span> <span class="w"> </span><span class="p">.</span><span class="n">nextPosition</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">end</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Token</span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">source</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">source</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">start</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">start</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">end</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">end</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">identifier</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">};</span> <span class="p">}</span> </pre></div> <h4 id="<code>lex</code>"><code>lex</code></h4><p>Now we can pull together all these helper functions in a public entrypoint for lexing.</p> <p>It will loop through a query string, eating whitespace and checking for tokens. It will continue until it hits the end of the query string. If it ever can't continue it fails.</p> <div class="highlight"><pre><span></span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">lex</span><span class="p">(</span><span class="n">source</span><span class="o">:</span><span class="w"> </span><span class="n">String</span><span class="p">,</span><span class="w"> </span><span class="n">tokens</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="n">Token</span><span class="p">))</span><span class="w"> </span><span class="o">?</span><span class="n">Error</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">i</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="kc">true</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">eatWhitespace</span><span class="p">(</span><span class="n">source</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="n">source</span><span class="p">.</span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">keywordRes</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">lexKeyword</span><span class="p">(</span><span class="n">source</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">keywordRes</span><span class="p">.</span><span class="n">token</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">token</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">tokens</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">token</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">&quot;Failed to allocate space for keyword token&quot;</span><span class="p">;</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">keywordRes</span><span class="p">.</span><span class="n">nextPosition</span><span class="p">;</span> <span class="w"> </span><span class="k">continue</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">integerRes</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">lexInteger</span><span class="p">(</span><span class="n">source</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">integerRes</span><span class="p">.</span><span class="n">token</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">token</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">tokens</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">token</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">&quot;Failed to allocate space for integer token&quot;</span><span class="p">;</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">integerRes</span><span class="p">.</span><span class="n">nextPosition</span><span class="p">;</span> <span class="w"> </span><span class="k">continue</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">stringRes</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">lexString</span><span class="p">(</span><span class="n">source</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">stringRes</span><span class="p">.</span><span class="n">token</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">token</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">tokens</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">token</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">&quot;Failed to allocate space for string token&quot;</span><span class="p">;</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stringRes</span><span class="p">.</span><span class="n">nextPosition</span><span class="p">;</span> <span class="w"> </span><span class="k">continue</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">identifierRes</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">lexIdentifier</span><span class="p">(</span><span class="n">source</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">identifierRes</span><span class="p">.</span><span class="n">token</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">token</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">tokens</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">token</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">&quot;Failed to allocate space for identifier token&quot;</span><span class="p">;</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">identifierRes</span><span class="p">.</span><span class="n">nextPosition</span><span class="p">;</span> <span class="w"> </span><span class="k">continue</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">tokens</span><span class="p">.</span><span class="n">items</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">debug</span><span class="p">(</span><span class="n">tokens</span><span class="p">.</span><span class="n">items</span><span class="p">,</span><span class="w"> </span><span class="n">tokens</span><span class="p">.</span><span class="n">items</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Last good token.</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">&quot;Bad token&quot;</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">null</span><span class="p">;</span> <span class="p">}</span> </pre></div> <p>That's it for lexing! Now we can do parsing.</p> <h3 id="parsing-(<code>parse.zig</code>,-407-loc)">Parsing (<code>parse.zig</code>, 407 LoC)</h3><p>Parsing takes an array of tokens from the lexing stage and discovers the tree structure in them that maps to a predefined syntax tree (AST).</p> <p>If it can't discover a valid tree from the array of tokens, it fails.</p> <p>Let's set up the basics of the <code>Parser</code> struct:</p> <div class="highlight"><pre><span></span><span class="n">const</span><span class="w"> </span><span class="n">std</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">@import</span><span class="p">(</span><span class="ss">&quot;std&quot;</span><span class="p">);</span> <span class="n">const</span><span class="w"> </span><span class="n">lex</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">@import</span><span class="p">(</span><span class="ss">&quot;lex.zig&quot;</span><span class="p">);</span> <span class="n">const</span><span class="w"> </span><span class="k">Result</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">@import</span><span class="p">(</span><span class="ss">&quot;types.zig&quot;</span><span class="p">).</span><span class="k">Result</span><span class="p">;</span> <span class="n">const</span><span class="w"> </span><span class="n">Token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">lex</span><span class="p">.</span><span class="n">Token</span><span class="p">;</span> <span class="n">pub</span><span class="w"> </span><span class="n">const</span><span class="w"> </span><span class="n">Parser</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">struct</span><span class="w"> </span><span class="err">{</span> <span class="w"> </span><span class="nl">allocator</span><span class="p">:</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">Allocator</span><span class="p">,</span> <span class="w"> </span><span class="n">pub</span><span class="w"> </span><span class="n">fn</span><span class="w"> </span><span class="n">init</span><span class="p">(</span><span class="nl">allocator</span><span class="p">:</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">Allocator</span><span class="p">)</span><span class="w"> </span><span class="n">Parser</span><span class="w"> </span><span class="err">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">Parser</span><span class="err">{</span><span class="w"> </span><span class="p">.</span><span class="n">allocator</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">allocator</span><span class="w"> </span><span class="err">}</span><span class="p">;</span> <span class="w"> </span><span class="err">}</span> <span class="w"> </span><span class="n">fn</span><span class="w"> </span><span class="n">expectTokenKind</span><span class="p">(</span><span class="nl">tokens</span><span class="p">:</span><span class="w"> </span><span class="err">[]</span><span class="n">Token</span><span class="p">,</span><span class="w"> </span><span class="k">index</span><span class="err">:</span><span class="w"> </span><span class="n">usize</span><span class="p">,</span><span class="w"> </span><span class="nl">kind</span><span class="p">:</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">)</span><span class="w"> </span><span class="n">bool</span><span class="w"> </span><span class="err">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="k">index</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="n">tokens</span><span class="p">.</span><span class="nf">len</span><span class="p">)</span><span class="w"> </span><span class="err">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">false</span><span class="p">;</span> <span class="w"> </span><span class="err">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">tokens</span><span class="o">[</span><span class="n">index</span><span class="o">]</span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">kind</span><span class="p">;</span> <span class="w"> </span><span class="err">}</span> </pre></div> <h4 id="expressions">Expressions</h4><p>Expressions are at the bottom of the syntax tree.</p> <p>They can be:</p> <ul> <li>Literals (like strings, integers, booleans, etc.)</li> <li>Or binary operations</li> </ul> <p>Let's define these in Zig:</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">BinaryOperationAST</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">operator</span><span class="o">:</span><span class="w"> </span><span class="n">Token</span><span class="p">,</span> <span class="w"> </span><span class="n">left</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">ExpressionAST</span><span class="p">,</span> <span class="w"> </span><span class="n">right</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">ExpressionAST</span><span class="p">,</span> <span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">print</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">BinaryOperationAST</span><span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">left</span><span class="p">.</span><span class="n">print</span><span class="p">();</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot; {s} &quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">self</span><span class="p">.</span><span class="n">operator</span><span class="p">.</span><span class="n">string</span><span class="p">()});</span> <span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">right</span><span class="p">.</span><span class="n">print</span><span class="p">();</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">ExpressionAST</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">union</span><span class="p">(</span><span class="k">enum</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">literal</span><span class="o">:</span><span class="w"> </span><span class="n">Token</span><span class="p">,</span> <span class="w"> </span><span class="n">binary_operation</span><span class="o">:</span><span class="w"> </span><span class="n">BinaryOperationAST</span><span class="p">,</span> <span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">print</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">ExpressionAST</span><span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">literal</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">literal</span><span class="o">|</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">literal</span><span class="p">.</span><span class="n">kind</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">string</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;&#39;{s}&#39;&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">literal</span><span class="p">.</span><span class="n">string</span><span class="p">()}),</span> <span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;{s}&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">literal</span><span class="p">.</span><span class="n">string</span><span class="p">()}),</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">.</span><span class="n">binary_operation</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">binary_operation</span><span class="p">.</span><span class="n">print</span><span class="p">(),</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">};</span> </pre></div> <p>Now we can attempt to parse either of these from an array of tokens.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">parseExpression</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">Parser</span><span class="p">,</span><span class="w"> </span><span class="n">tokens</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="n">Token</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="p">)</span><span class="w"> </span><span class="n">Result</span><span class="p">(</span><span class="k">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">ast</span><span class="o">:</span><span class="w"> </span><span class="n">ExpressionAST</span><span class="p">,</span> <span class="w"> </span><span class="n">nextPosition</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="p">,</span> <span class="w"> </span><span class="p">})</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">index</span><span class="p">;</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">e</span><span class="o">:</span><span class="w"> </span><span class="n">ExpressionAST</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">undefined</span><span class="p">;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">integer</span><span class="p">)</span><span class="w"> </span><span class="k">or</span> <span class="w"> </span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">identifier</span><span class="p">)</span><span class="w"> </span><span class="k">or</span> <span class="w"> </span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">string</span><span class="p">))</span> <span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">e</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ExpressionAST</span><span class="p">{</span><span class="w"> </span><span class="p">.</span><span class="n">literal</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;No expression&quot;</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">equal_operator</span><span class="p">)</span><span class="w"> </span><span class="k">or</span> <span class="w"> </span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">lt_operator</span><span class="p">)</span><span class="w"> </span><span class="k">or</span> <span class="w"> </span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">plus_operator</span><span class="p">)</span><span class="w"> </span><span class="k">or</span> <span class="w"> </span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">concat_operator</span><span class="p">))</span> <span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">newE</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ExpressionAST</span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">binary_operation</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">BinaryOperationAST</span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">operator</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="w"> </span><span class="p">.</span><span class="n">left</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">.</span><span class="n">create</span><span class="p">(</span><span class="n">ExpressionAST</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span> <span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;Could not allocate for left expression.&quot;</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">.</span><span class="n">right</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">.</span><span class="n">create</span><span class="p">(</span><span class="n">ExpressionAST</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span> <span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;Could not allocate for right expression.&quot;</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="n">newE</span><span class="p">.</span><span class="n">binary_operation</span><span class="p">.</span><span class="n">left</span><span class="p">.</span><span class="o">*</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">e</span><span class="p">;</span> <span class="w"> </span><span class="n">e</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">newE</span><span class="p">;</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">parseExpression</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">val</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">e</span><span class="p">.</span><span class="n">binary_operation</span><span class="p">.</span><span class="n">right</span><span class="p">.</span><span class="o">*</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">val</span><span class="p">.</span><span class="n">ast</span><span class="p">;</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">val</span><span class="p">.</span><span class="n">nextPosition</span><span class="p">;</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">ast</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">e</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">nextPosition</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Basically, we assume it's a literal expression unless we see an operator after it. If there's an operator after it we call <code>parseExpression</code> recursively and return a binary expression.</p> <p>Important to note: this skips both implicit operator precedence and explicit precedence via parenthesis.</p> <h4 id="<code>select</code>"><code>SELECT</code></h4><p>A <code>SELECT</code> query's structure has a <code>FROM</code> table name, a comma-separated list of expressions, and an optional <code>WHERE</code> section with another expression for the where.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">SelectAST</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">columns</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="n">ExpressionAST</span><span class="p">,</span> <span class="w"> </span><span class="n">from</span><span class="o">:</span><span class="w"> </span><span class="n">Token</span><span class="p">,</span> <span class="w"> </span><span class="n">where</span><span class="o">:</span><span class="w"> </span><span class="o">?</span><span class="n">ExpressionAST</span><span class="p">,</span> <span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">print</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">SelectAST</span><span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;SELECT</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">columns</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">column</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot; &quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span> <span class="w"> </span><span class="n">column</span><span class="p">.</span><span class="n">print</span><span class="p">();</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">columns</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;,&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;FROM</span><span class="se">\n</span><span class="s"> {s}&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">self</span><span class="p">.</span><span class="n">from</span><span class="p">.</span><span class="n">string</span><span class="p">()});</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">where</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">where</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;</span><span class="se">\n</span><span class="s">WHERE</span><span class="se">\n</span><span class="s"> &quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span> <span class="w"> </span><span class="n">where</span><span class="p">.</span><span class="n">print</span><span class="p">();</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">};</span> </pre></div> <p>To parse it we look for:</p> <ul> <li><code>SELECT</code></li> <li>Then a comma separated list of <code>ExpressionAST</code>s</li> <li>Then a <code>FROM</code></li> <li>Then optionally a <code>WHERE</code><ul> <li>And then another <code>ExpressionAST</code></li> </ul> </li> </ul> <p>With the help of <code>expectTokenKind</code> and <code>parseExpression</code> it is not too difficult, but a little verbose, to write.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">parseSelect</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">Parser</span><span class="p">,</span><span class="w"> </span><span class="n">tokens</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="n">Token</span><span class="p">)</span><span class="w"> </span><span class="n">Result</span><span class="p">(</span><span class="n">AST</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">i</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">select_keyword</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;Expected SELECT keyword&quot;</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">columns</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="n">ExpressionAST</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">select</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">SelectAST</span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">columns</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">undefined</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">from</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">undefined</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">where</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">null</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="c1">// Parse columns</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">from_keyword</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">columns</span><span class="p">.</span><span class="n">items</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">comma_syntax</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">lex</span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected comma.</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;Expected comma.&quot;</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">parseExpression</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">val</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">val</span><span class="p">.</span><span class="n">nextPosition</span><span class="p">;</span> <span class="w"> </span><span class="n">columns</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">val</span><span class="p">.</span><span class="n">ast</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span> <span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;Could not allocate for token.&quot;</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">from_keyword</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">lex</span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected FROM keyword after this.</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;Expected FROM keyword&quot;</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">identifier</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">lex</span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected FROM table name after this.</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;Expected FROM keyword&quot;</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">select</span><span class="p">.</span><span class="n">from</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">i</span><span class="p">];</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">where_keyword</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// i + 1, skip past the where</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">parseExpression</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">val</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">select</span><span class="p">.</span><span class="n">where</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">val</span><span class="p">.</span><span class="n">ast</span><span class="p">;</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">val</span><span class="p">.</span><span class="n">nextPosition</span><span class="p">;</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">tokens</span><span class="p">.</span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">lex</span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Unexpected token.&quot;</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;Did not complete parsing SELECT&quot;</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">select</span><span class="p">.</span><span class="n">columns</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">columns</span><span class="p">.</span><span class="n">items</span><span class="p">;</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">AST</span><span class="p">{</span><span class="w"> </span><span class="p">.</span><span class="n">select</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">select</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>That's it!</p> <h4 id="<code>create-table</code>"><code>CREATE TABLE</code></h4><p>A <code>CREATE TABLE</code> query's structure has a table name and a list of comma separated identifier pairs for column name and kind.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">CreateTableColumnAST</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">name</span><span class="o">:</span><span class="w"> </span><span class="n">Token</span><span class="p">,</span> <span class="w"> </span><span class="n">kind</span><span class="o">:</span><span class="w"> </span><span class="n">Token</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">CreateTableAST</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">table</span><span class="o">:</span><span class="w"> </span><span class="n">Token</span><span class="p">,</span> <span class="w"> </span><span class="n">columns</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="n">CreateTableColumnAST</span><span class="p">,</span> <span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">print</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">CreateTableAST</span><span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;CREATE TABLE {s} (</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">self</span><span class="p">.</span><span class="n">table</span><span class="p">.</span><span class="n">string</span><span class="p">()});</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">columns</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">column</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span> <span class="w"> </span><span class="s">&quot; {s} {s}&quot;</span><span class="p">,</span> <span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="n">column</span><span class="p">.</span><span class="n">name</span><span class="p">.</span><span class="n">string</span><span class="p">(),</span><span class="w"> </span><span class="n">column</span><span class="p">.</span><span class="n">kind</span><span class="p">.</span><span class="n">string</span><span class="p">()</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">columns</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;,&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;)</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">};</span> </pre></div> <p>To parse it we look for:</p> <ul> <li><code>CREATE TABLE</code></li> <li>Followed by an identifier (the table name)</li> <li>Followed by open parenthesis</li> <li>Followed by a comma separated list of identifier pairs</li> <li>Followed by close parenthesis</li> </ul> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">parseCreateTable</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">Parser</span><span class="p">,</span><span class="w"> </span><span class="n">tokens</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="n">Token</span><span class="p">)</span><span class="w"> </span><span class="n">Result</span><span class="p">(</span><span class="n">AST</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">i</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">create_table_keyword</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;Expected CREATE TABLE keyword&quot;</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">identifier</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">lex</span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected table name after CREATE TABLE keyword.</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;Expected CREATE TABLE name&quot;</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">columns</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="n">CreateTableColumnAST</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">create_table</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">CreateTableAST</span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">columns</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">undefined</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">table</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">left_paren_syntax</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">lex</span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected opening paren after CREATE TABLE name.</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;Expected opening paren&quot;</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">right_paren_syntax</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">columns</span><span class="p">.</span><span class="n">items</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">comma_syntax</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">lex</span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected comma.</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;Expected comma.&quot;</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">column</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">CreateTableColumnAST</span><span class="p">{</span><span class="w"> </span><span class="p">.</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">undefined</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">undefined</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">identifier</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">lex</span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected column name after comma.</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;Expected identifier.&quot;</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">column</span><span class="p">.</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">i</span><span class="p">];</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">identifier</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">lex</span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected column type after column name.</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;Expected identifier.&quot;</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">column</span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">i</span><span class="p">];</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="n">columns</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">column</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span> <span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;Could not allocate for column.&quot;</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Skip past final paren.</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">tokens</span><span class="p">.</span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">lex</span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Unexpected token.&quot;</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;Did not complete parsing CREATE TABLE&quot;</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">create_table</span><span class="p">.</span><span class="n">columns</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">columns</span><span class="p">.</span><span class="n">items</span><span class="p">;</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">AST</span><span class="p">{</span><span class="w"> </span><span class="p">.</span><span class="n">create_table</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">create_table</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> </pre></div> <h4 id="<code>insert-into</code>"><code>INSERT INTO</code></h4><p>And last we've got <code>INSERT INTO</code>. This tree has table name and a list of expressions to insert into the table.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">InsertAST</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">table</span><span class="o">:</span><span class="w"> </span><span class="n">Token</span><span class="p">,</span> <span class="w"> </span><span class="n">values</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="n">ExpressionAST</span><span class="p">,</span> <span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">print</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">InsertAST</span><span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;INSERT INTO {s} VALUES (&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">self</span><span class="p">.</span><span class="n">table</span><span class="p">.</span><span class="n">string</span><span class="p">()});</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">values</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">value</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">value</span><span class="p">.</span><span class="n">print</span><span class="p">();</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">values</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;, &quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;)</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">};</span> </pre></div> <p>We parse it by looking for:</p> <ul> <li><code>INSERT INTO</code></li> <li>Followed by a table name</li> <li>Followed by <code>VALUES</code></li> <li>Followed by open parenthesis</li> <li>Followed by a comma-separated list of expressions</li> <li>Followed by a close parenthesis</li> </ul> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">parseInsert</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">Parser</span><span class="p">,</span><span class="w"> </span><span class="n">tokens</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="n">Token</span><span class="p">)</span><span class="w"> </span><span class="n">Result</span><span class="p">(</span><span class="n">AST</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">i</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">insert_keyword</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;Expected INSERT INTO keyword&quot;</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">identifier</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">lex</span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected table name after INSERT INTO keyword.</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;Expected INSERT INTO table name&quot;</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">values</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="n">ExpressionAST</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">insert</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">InsertAST</span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">values</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">undefined</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">table</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">values_keyword</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">lex</span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected VALUES keyword.</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;Expected VALUES keyword&quot;</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">left_paren_syntax</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">lex</span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected opening paren after CREATE TABLE name.</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;Expected opening paren&quot;</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">right_paren_syntax</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">values</span><span class="p">.</span><span class="n">items</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">comma_syntax</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">lex</span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected comma.</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;Expected comma.&quot;</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">parseExpression</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">val</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">values</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">val</span><span class="p">.</span><span class="n">ast</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span> <span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;Could not allocate for expression.&quot;</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">val</span><span class="p">.</span><span class="n">nextPosition</span><span class="p">;</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Skip past final paren.</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">tokens</span><span class="p">.</span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">lex</span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Unexpected token.&quot;</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;Did not complete parsing INSERT INTO&quot;</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">insert</span><span class="p">.</span><span class="n">values</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">values</span><span class="p">.</span><span class="n">items</span><span class="p">;</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">AST</span><span class="p">{</span><span class="w"> </span><span class="p">.</span><span class="n">insert</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">insert</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> </pre></div> <h4 id="<code>ast</code>"><code>AST</code></h4><p>Finally we can define the top-level SQL <code>AST</code> as being the union of the above three query types.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">AST</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">union</span><span class="p">(</span><span class="k">enum</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">select</span><span class="o">:</span><span class="w"> </span><span class="n">SelectAST</span><span class="p">,</span> <span class="w"> </span><span class="n">insert</span><span class="o">:</span><span class="w"> </span><span class="n">InsertAST</span><span class="p">,</span> <span class="w"> </span><span class="n">create_table</span><span class="o">:</span><span class="w"> </span><span class="n">CreateTableAST</span><span class="p">,</span> <span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">print</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">AST</span><span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">select</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">select</span><span class="o">|</span><span class="w"> </span><span class="n">select</span><span class="p">.</span><span class="n">print</span><span class="p">(),</span> <span class="w"> </span><span class="p">.</span><span class="n">insert</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">insert</span><span class="o">|</span><span class="w"> </span><span class="n">insert</span><span class="p">.</span><span class="n">print</span><span class="p">(),</span> <span class="w"> </span><span class="p">.</span><span class="n">create_table</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">create_table</span><span class="o">|</span><span class="w"> </span><span class="n">create_table</span><span class="p">.</span><span class="n">print</span><span class="p">(),</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">};</span> </pre></div> <p>And we can implement <code>parse</code> by switching on the current token.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">parse</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">Parser</span><span class="p">,</span><span class="w"> </span><span class="n">tokens</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="n">Token</span><span class="p">)</span><span class="w"> </span><span class="n">Result</span><span class="p">(</span><span class="n">AST</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">select_keyword</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">parseSelect</span><span class="p">(</span><span class="n">tokens</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">val</span><span class="o">|</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">create_table_keyword</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">parseCreateTable</span><span class="p">(</span><span class="n">tokens</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">val</span><span class="o">|</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">expectTokenKind</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="n">Token</span><span class="p">.</span><span class="n">Kind</span><span class="p">.</span><span class="n">insert_keyword</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">parseInsert</span><span class="p">(</span><span class="n">tokens</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">val</span><span class="o">|</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;Unknown statement&quot;</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="p">};</span> </pre></div> <p>Perfect. For today. :)</p> <h3 id="storage-(<code>storage.zig</code>,-338-loc)">Storage (<code>storage.zig</code>, 338 LoC)</h3><p>Next we're going to switch contexts completely and think about how tables and rows will get serialized into bytes that can be stored on disk.</p> <p>The storage layer will define a few general helpers for correctly serializing and deserializing strings and numbers:</p> <div class="highlight"><pre><span></span><span class="k">const</span><span class="w"> </span><span class="n">std</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="err">@</span><span class="n">import</span><span class="p">(</span><span class="s2">&quot;std&quot;</span><span class="p">);</span> <span class="k">const</span><span class="w"> </span><span class="n">RocksDB</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="err">@</span><span class="n">import</span><span class="p">(</span><span class="s2">&quot;rocksdb.zig&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">RocksDB</span><span class="p">;</span> <span class="k">const</span><span class="w"> </span><span class="n">Error</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="err">@</span><span class="n">import</span><span class="p">(</span><span class="s2">&quot;types.zig&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">Error</span><span class="p">;</span> <span class="k">const</span><span class="w"> </span><span class="n">Result</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="err">@</span><span class="n">import</span><span class="p">(</span><span class="s2">&quot;types.zig&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">Result</span><span class="p">;</span> <span class="k">const</span><span class="w"> </span><span class="nb nb-Type">String</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="err">@</span><span class="n">import</span><span class="p">(</span><span class="s2">&quot;types.zig&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">String</span><span class="p">;</span> <span class="n">pub</span><span class="w"> </span><span class="n">fn</span><span class="w"> </span><span class="n">serializeInteger</span><span class="p">(</span><span class="n">comptime</span><span class="w"> </span><span class="n">T</span><span class="p">:</span><span class="w"> </span><span class="n">type</span><span class="p">,</span><span class="w"> </span><span class="n">buf</span><span class="p">:</span><span class="w"> </span><span class="o">*</span><span class="n">std</span><span class="o">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="n">u8</span><span class="p">),</span><span class="w"> </span><span class="n">i</span><span class="p">:</span><span class="w"> </span><span class="n">T</span><span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="nb nb-Type">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">var</span><span class="w"> </span><span class="n">length</span><span class="p">:</span><span class="w"> </span><span class="p">[</span><span class="err">@</span><span class="n">sizeOf</span><span class="p">(</span><span class="n">T</span><span class="p">)]</span><span class="n">u8</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">undefined</span><span class="p">;</span> <span class="w"> </span><span class="n">std</span><span class="o">.</span><span class="n">mem</span><span class="o">.</span><span class="n">writeIntBig</span><span class="p">(</span><span class="n">T</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="n">length</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">);</span> <span class="w"> </span><span class="n">try</span><span class="w"> </span><span class="n">buf</span><span class="o">.</span><span class="n">appendSlice</span><span class="p">(</span><span class="n">length</span><span class="p">[</span><span class="mf">0.</span><span class="o">.</span><span class="mi">8</span><span class="p">]);</span> <span class="p">}</span> <span class="n">pub</span><span class="w"> </span><span class="n">fn</span><span class="w"> </span><span class="n">deserializeInteger</span><span class="p">(</span><span class="n">comptime</span><span class="w"> </span><span class="n">T</span><span class="p">:</span><span class="w"> </span><span class="n">type</span><span class="p">,</span><span class="w"> </span><span class="n">buf</span><span class="p">:</span><span class="w"> </span><span class="nb nb-Type">String</span><span class="p">)</span><span class="w"> </span><span class="n">T</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">std</span><span class="o">.</span><span class="n">mem</span><span class="o">.</span><span class="n">readIntBig</span><span class="p">(</span><span class="n">T</span><span class="p">,</span><span class="w"> </span><span class="n">buf</span><span class="p">[</span><span class="mf">0.</span><span class="o">.</span><span class="err">@</span><span class="n">sizeOf</span><span class="p">(</span><span class="n">T</span><span class="p">)]);</span> <span class="p">}</span> <span class="n">pub</span><span class="w"> </span><span class="n">fn</span><span class="w"> </span><span class="n">serializeBytes</span><span class="p">(</span><span class="n">buf</span><span class="p">:</span><span class="w"> </span><span class="o">*</span><span class="n">std</span><span class="o">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="n">u8</span><span class="p">),</span><span class="w"> </span><span class="n">bytes</span><span class="p">:</span><span class="w"> </span><span class="nb nb-Type">String</span><span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="nb nb-Type">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">try</span><span class="w"> </span><span class="n">serializeInteger</span><span class="p">(</span><span class="n">u64</span><span class="p">,</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">bytes</span><span class="o">.</span><span class="n">len</span><span class="p">);</span> <span class="w"> </span><span class="n">try</span><span class="w"> </span><span class="n">buf</span><span class="o">.</span><span class="n">appendSlice</span><span class="p">(</span><span class="n">bytes</span><span class="p">);</span> <span class="p">}</span> <span class="n">pub</span><span class="w"> </span><span class="n">fn</span><span class="w"> </span><span class="n">deserializeBytes</span><span class="p">(</span><span class="n">bytes</span><span class="p">:</span><span class="w"> </span><span class="nb nb-Type">String</span><span class="p">)</span><span class="w"> </span><span class="n">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">offset</span><span class="p">:</span><span class="w"> </span><span class="n">usize</span><span class="p">,</span> <span class="w"> </span><span class="n">bytes</span><span class="p">:</span><span class="w"> </span><span class="nb nb-Type">String</span><span class="p">,</span> <span class="p">}</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">var</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">deserializeInteger</span><span class="p">(</span><span class="n">u64</span><span class="p">,</span><span class="w"> </span><span class="n">bytes</span><span class="p">);</span> <span class="w"> </span><span class="k">var</span><span class="w"> </span><span class="n">offset</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">8</span><span class="p">;</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">.</span><span class="p">{</span><span class="w"> </span><span class="o">.</span><span class="n">offset</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">offset</span><span class="p">,</span><span class="w"> </span><span class="o">.</span><span class="n">bytes</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">bytes</span><span class="p">[</span><span class="mf">8.</span><span class="o">.</span><span class="n">offset</span><span class="p">]</span><span class="w"> </span><span class="p">};</span> <span class="p">}</span> </pre></div> <p>Then we'll define the <code>Storage</code> struct itself. Under the hood it will use RocksDB to store and recover data on disk.</p> <div class="highlight"><pre><span></span><span class="kr">pub</span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">Storage</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">db</span><span class="o">:</span><span class="w"> </span><span class="n">RocksDB</span><span class="p">,</span> <span class="w"> </span><span class="n">allocator</span><span class="o">:</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">Allocator</span><span class="p">,</span> <span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">init</span><span class="p">(</span><span class="n">allocator</span><span class="o">:</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">Allocator</span><span class="p">,</span><span class="w"> </span><span class="n">db</span><span class="o">:</span><span class="w"> </span><span class="n">RocksDB</span><span class="p">)</span><span class="w"> </span><span class="n">Storage</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">Storage</span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">db</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">db</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">allocator</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">allocator</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Now let's think about storage entities.</p> <h4 id="values">Values</h4><p>The fundamental unit in the database is a value, or cell. It can be either a boolean, an integer, a string, or null.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">Value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">union</span><span class="p">(</span><span class="k">enum</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">bool_value</span><span class="o">:</span><span class="w"> </span><span class="kt">bool</span><span class="p">,</span> <span class="w"> </span><span class="n">null_value</span><span class="o">:</span><span class="w"> </span><span class="kt">bool</span><span class="p">,</span> <span class="w"> </span><span class="n">string_value</span><span class="o">:</span><span class="w"> </span><span class="n">String</span><span class="p">,</span> <span class="w"> </span><span class="n">integer_value</span><span class="o">:</span><span class="w"> </span><span class="kt">i64</span><span class="p">,</span> <span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">TRUE</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Value</span><span class="p">{</span><span class="w"> </span><span class="p">.</span><span class="n">bool_value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">true</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">FALSE</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Value</span><span class="p">{</span><span class="w"> </span><span class="p">.</span><span class="n">bool_value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">false</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">NULL</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Value</span><span class="p">{</span><span class="w"> </span><span class="p">.</span><span class="n">null_value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">true</span><span class="w"> </span><span class="p">};</span> </pre></div> <p>Since all values are strings in the original query, we'll provide a <code>fromIntegerString</code> that we can use to convert.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">fromIntegerString</span><span class="p">(</span><span class="n">iBytes</span><span class="o">:</span><span class="w"> </span><span class="n">String</span><span class="p">)</span><span class="w"> </span><span class="n">Value</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">fmt</span><span class="p">.</span><span class="n">parseInt</span><span class="p">(</span><span class="kt">i64</span><span class="p">,</span><span class="w"> </span><span class="n">iBytes</span><span class="p">,</span><span class="w"> </span><span class="mi">10</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">Value</span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">integer_value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">Value</span><span class="p">{</span><span class="w"> </span><span class="p">.</span><span class="n">integer_value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Next we'll define functions to cast values to boolean.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">asBool</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">Value</span><span class="p">)</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">null_value</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="kc">false</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">bool_value</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">value</span><span class="o">|</span><span class="w"> </span><span class="n">value</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">string_value</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">value</span><span class="o">|</span><span class="w"> </span><span class="n">value</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">integer_value</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">value</span><span class="o">|</span><span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>To strings.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">asString</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">Value</span><span class="p">,</span><span class="w"> </span><span class="n">buf</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="kt">u8</span><span class="p">))</span><span class="w"> </span><span class="o">!</span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">null_value</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="c1">// Do nothing</span> <span class="w"> </span><span class="p">.</span><span class="n">bool_value</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">value</span><span class="o">|</span><span class="w"> </span><span class="n">buf</span><span class="p">.</span><span class="n">appendSlice</span><span class="p">(</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">value</span><span class="p">)</span><span class="w"> </span><span class="s">&quot;true&quot;</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="s">&quot;false&quot;</span><span class="p">),</span> <span class="w"> </span><span class="p">.</span><span class="n">string_value</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">value</span><span class="o">|</span><span class="w"> </span><span class="n">buf</span><span class="p">.</span><span class="n">appendSlice</span><span class="p">(</span><span class="n">value</span><span class="p">),</span> <span class="w"> </span><span class="p">.</span><span class="n">integer_value</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">value</span><span class="o">|</span><span class="w"> </span><span class="n">buf</span><span class="p">.</span><span class="n">writer</span><span class="p">().</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;{d}&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">value</span><span class="p">}),</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>And to integers.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">asInteger</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">Value</span><span class="p">)</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">null_value</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">bool_value</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">value</span><span class="o">|</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">value</span><span class="p">)</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">string_value</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">value</span><span class="o">|</span><span class="w"> </span><span class="n">fromIntegerString</span><span class="p">(</span><span class="n">value</span><span class="p">).</span><span class="n">integer_value</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">integer_value</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">value</span><span class="o">|</span><span class="w"> </span><span class="n">value</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>And finally the storage layer's core concern: serialization...</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">serialize</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">Value</span><span class="p">,</span><span class="w"> </span><span class="n">buf</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="kt">u8</span><span class="p">))</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">null_value</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="n">buf</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="s">&#39;0&#39;</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">bool_value</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">value</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">buf</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="s">&#39;1&#39;</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">;</span> <span class="w"> </span><span class="n">buf</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">value</span><span class="p">)</span><span class="w"> </span><span class="s">&#39;1&#39;</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="s">&#39;0&#39;</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">;</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">.</span><span class="n">string_value</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">value</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">buf</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="s">&#39;2&#39;</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">;</span> <span class="w"> </span><span class="n">buf</span><span class="p">.</span><span class="n">appendSlice</span><span class="p">(</span><span class="n">value</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">;</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">.</span><span class="n">integer_value</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">value</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">buf</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="s">&#39;3&#39;</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">;</span> <span class="w"> </span><span class="n">serializeInteger</span><span class="p">(</span><span class="kt">i64</span><span class="p">,</span><span class="w"> </span><span class="n">buf</span><span class="p">,</span><span class="w"> </span><span class="n">value</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">;</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">buf</span><span class="p">.</span><span class="n">items</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>And deserialization.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">deserialize</span><span class="p">(</span><span class="n">data</span><span class="o">:</span><span class="w"> </span><span class="n">String</span><span class="p">)</span><span class="w"> </span><span class="n">Value</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">data</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="s">&#39;0&#39;</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="n">Value</span><span class="p">.</span><span class="n">NULL</span><span class="p">,</span> <span class="w"> </span><span class="s">&#39;1&#39;</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="n">Value</span><span class="p">{</span><span class="w"> </span><span class="p">.</span><span class="n">bool_value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&#39;1&#39;</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="s">&#39;2&#39;</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="n">Value</span><span class="p">{</span><span class="w"> </span><span class="p">.</span><span class="n">string_value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data</span><span class="p">[</span><span class="mi">1</span><span class="p">..]</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="s">&#39;3&#39;</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="n">Value</span><span class="p">{</span><span class="w"> </span><span class="p">.</span><span class="n">integer_value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">deserializeInteger</span><span class="p">(</span><span class="kt">i64</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="p">[</span><span class="mi">1</span><span class="p">..])</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="k">unreachable</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">};</span> </pre></div> <p>We use a simple, space-inefficient scheme for encoding/decoding to bytes that can be written to disk.</p> <h4 id="rows">Rows</h4><p>Now that we've got values, we can define rows in terms of values. And we can provide a few helper functions for getting cells by field name.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">Row</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">allocator</span><span class="o">:</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">Allocator</span><span class="p">,</span> <span class="w"> </span><span class="n">cells</span><span class="o">:</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="n">String</span><span class="p">),</span> <span class="w"> </span><span class="n">fields</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="n">String</span><span class="p">,</span> <span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">init</span><span class="p">(</span><span class="n">allocator</span><span class="o">:</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">Allocator</span><span class="p">,</span><span class="w"> </span><span class="n">fields</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="n">String</span><span class="p">)</span><span class="w"> </span><span class="n">Row</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">Row</span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">allocator</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">allocator</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">cells</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="n">String</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">allocator</span><span class="p">),</span> <span class="w"> </span><span class="p">.</span><span class="n">fields</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fields</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">append</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">Row</span><span class="p">,</span><span class="w"> </span><span class="n">cell</span><span class="o">:</span><span class="w"> </span><span class="n">Value</span><span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">cellBuffer</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="kt">u8</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span> <span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">cells</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">cell</span><span class="p">.</span><span class="n">serialize</span><span class="p">(</span><span class="o">&amp;</span><span class="n">cellBuffer</span><span class="p">));</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">appendBytes</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">Row</span><span class="p">,</span><span class="w"> </span><span class="n">cell</span><span class="o">:</span><span class="w"> </span><span class="n">String</span><span class="p">)</span><span class="w"> </span><span class="o">!</span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">cells</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">cell</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">get</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">Row</span><span class="p">,</span><span class="w"> </span><span class="n">field</span><span class="o">:</span><span class="w"> </span><span class="n">String</span><span class="p">)</span><span class="w"> </span><span class="n">Value</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">fields</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">f</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">field</span><span class="p">,</span><span class="w"> </span><span class="n">f</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Results are internal buffer views. So make a copy.</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">copy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="kt">u8</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span> <span class="w"> </span><span class="n">copy</span><span class="p">.</span><span class="n">appendSlice</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">cells</span><span class="p">.</span><span class="n">items</span><span class="p">[</span><span class="n">i</span><span class="p">])</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">Storage</span><span class="p">.</span><span class="n">Value</span><span class="p">.</span><span class="n">NULL</span><span class="p">;</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">Storage</span><span class="p">.</span><span class="n">Value</span><span class="p">.</span><span class="n">deserialize</span><span class="p">(</span><span class="n">copy</span><span class="p">.</span><span class="n">items</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">Value</span><span class="p">.</span><span class="n">NULL</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">items</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">Row</span><span class="p">)</span><span class="w"> </span><span class="p">[]</span><span class="n">String</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">cells</span><span class="p">.</span><span class="n">items</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">reset</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">Row</span><span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">cells</span><span class="p">.</span><span class="n">clearRetainingCapacity</span><span class="p">();</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">};</span> </pre></div> <p>Since values are serialized with length prefixes, we can serialize a row by concatenating all the values together.</p> <p>Since we must map to keys and values for RocksDB, we give each row a key prefix that is the table name. And then we give it a random suffix to distinguish it from other rows in the table. A more intelligent design would use the table's primary key as the suffix but we don't support primary keys yet. (See also, the section on "Mapping SQL to key-value storage" in <a href="https://notes.eatonphil.com/whats-the-big-deal-about-key-value-databases.html">What's the big deal about key-value databases like FoundationDB and RocksDB?</a>.)</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">generateId</span><span class="p">()</span><span class="w"> </span><span class="o">!</span><span class="p">[]</span><span class="kt">u8</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">file</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">fs</span><span class="p">.</span><span class="n">cwd</span><span class="p">().</span><span class="n">openFileZ</span><span class="p">(</span><span class="s">&quot;/dev/random&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">file</span><span class="p">.</span><span class="n">close</span><span class="p">();</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">buf</span><span class="o">:</span><span class="w"> </span><span class="p">[</span><span class="mi">16</span><span class="p">]</span><span class="kt">u8</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">.{};</span> <span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">file</span><span class="p">.</span><span class="n">read</span><span class="p">(</span><span class="o">&amp;</span><span class="n">buf</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">buf</span><span class="p">[</span><span class="mi">0</span><span class="p">..];</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">writeRow</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">Storage</span><span class="p">,</span><span class="w"> </span><span class="n">table</span><span class="o">:</span><span class="w"> </span><span class="n">String</span><span class="p">,</span><span class="w"> </span><span class="n">row</span><span class="o">:</span><span class="w"> </span><span class="n">Row</span><span class="p">)</span><span class="w"> </span><span class="o">?</span><span class="n">Error</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Table name prefix</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">key</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="kt">u8</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span> <span class="w"> </span><span class="n">key</span><span class="p">.</span><span class="n">writer</span><span class="p">().</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;row_{s}_&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">table</span><span class="p">})</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">&quot;Could not allocate row key&quot;</span><span class="p">;</span> <span class="w"> </span><span class="c1">// Unique row id</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">generateId</span><span class="p">()</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">&quot;Could not generate id&quot;</span><span class="p">;</span> <span class="w"> </span><span class="n">key</span><span class="p">.</span><span class="n">appendSlice</span><span class="p">(</span><span class="n">id</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">&quot;Could not allocate for id&quot;</span><span class="p">;</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="kt">u8</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">row</span><span class="p">.</span><span class="n">cells</span><span class="p">.</span><span class="n">items</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">cell</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">serializeBytes</span><span class="p">(</span><span class="o">&amp;</span><span class="n">value</span><span class="p">,</span><span class="w"> </span><span class="n">cell</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">&quot;Could not allocate for cell&quot;</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">db</span><span class="p">.</span><span class="n">set</span><span class="p">(</span><span class="n">key</span><span class="p">.</span><span class="n">items</span><span class="p">,</span><span class="w"> </span><span class="n">value</span><span class="p">.</span><span class="n">items</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> </pre></div> <h4 id="rowiter">RowIter</h4><p>Reading rows will be slightly different from writing rows since reading rows will use an iterator. We will wrap the RocksDB iterator so the consumer of <code>Storage</code> only needs to deal with <code>Row</code>s and <code>Value</code>s.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">RowIter</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">row</span><span class="o">:</span><span class="w"> </span><span class="n">Row</span><span class="p">,</span> <span class="w"> </span><span class="n">iter</span><span class="o">:</span><span class="w"> </span><span class="n">RocksDB</span><span class="p">.</span><span class="n">Iter</span><span class="p">,</span> <span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">init</span><span class="p">(</span><span class="n">allocator</span><span class="o">:</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">Allocator</span><span class="p">,</span><span class="w"> </span><span class="n">iter</span><span class="o">:</span><span class="w"> </span><span class="n">RocksDB</span><span class="p">.</span><span class="n">Iter</span><span class="p">,</span><span class="w"> </span><span class="n">fields</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="n">String</span><span class="p">)</span><span class="w"> </span><span class="n">RowIter</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">RowIter</span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">iter</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">iter</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">row</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Row</span><span class="p">.</span><span class="n">init</span><span class="p">(</span><span class="n">allocator</span><span class="p">,</span><span class="w"> </span><span class="n">fields</span><span class="p">),</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">next</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">RowIter</span><span class="p">)</span><span class="w"> </span><span class="o">?</span><span class="n">Row</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">rowBytes</span><span class="o">:</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">undefined</span><span class="p">;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">iter</span><span class="p">.</span><span class="n">next</span><span class="p">())</span><span class="w"> </span><span class="o">|</span><span class="n">b</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">rowBytes</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">b</span><span class="p">.</span><span class="n">value</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">null</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">row</span><span class="p">.</span><span class="n">reset</span><span class="p">();</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">offset</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">offset</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">rowBytes</span><span class="p">.</span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">d</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">deserializeBytes</span><span class="p">(</span><span class="n">rowBytes</span><span class="p">[</span><span class="n">offset</span><span class="p">..]);</span> <span class="w"> </span><span class="n">offset</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">d</span><span class="p">.</span><span class="n">offset</span><span class="p">;</span> <span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">row</span><span class="p">.</span><span class="n">appendBytes</span><span class="p">(</span><span class="n">d</span><span class="p">.</span><span class="n">bytes</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">null</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">row</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">close</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">RowIter</span><span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">iter</span><span class="p">.</span><span class="n">close</span><span class="p">();</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">};</span> </pre></div> <p>It does the opposite of what <code>writeRow</code> did in terms of deserializing cells one after another. Again, this works because each cell is length-prefixed.</p> <p>Next we must provide the interface for actually getting a <code>RowIter</code>. The only condition for the <code>RowIter</code> at the moment is that it contains all rows in the table.</p> <p>Since we wrote each row with a table name prefix, we can recover it by iterating over all rows with that prefix.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">getRowIter</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">Storage</span><span class="p">,</span><span class="w"> </span><span class="n">table</span><span class="o">:</span><span class="w"> </span><span class="n">String</span><span class="p">)</span><span class="w"> </span><span class="n">Result</span><span class="p">(</span><span class="n">RowIter</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">rowPrefix</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="kt">u8</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span> <span class="w"> </span><span class="n">rowPrefix</span><span class="p">.</span><span class="n">writer</span><span class="p">().</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;row_{s}_&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">table</span><span class="p">})</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span> <span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;Could not allocate for row prefix&quot;</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">iter</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">db</span><span class="p">.</span><span class="n">iter</span><span class="p">(</span><span class="n">rowPrefix</span><span class="p">.</span><span class="n">items</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">it</span><span class="o">|</span><span class="w"> </span><span class="n">it</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">tableInfo</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">getTable</span><span class="p">(</span><span class="n">table</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">t</span><span class="o">|</span><span class="w"> </span><span class="n">t</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span> <span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">RowIter</span><span class="p">.</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">,</span><span class="w"> </span><span class="n">iter</span><span class="p">,</span><span class="w"> </span><span class="n">tableInfo</span><span class="p">.</span><span class="n">columns</span><span class="p">),</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> </pre></div> <h4 id="tables">Tables</h4><p>Finally we've got tables. We must store table metadata: its name, columns and column types.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">Table</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">name</span><span class="o">:</span><span class="w"> </span><span class="n">String</span><span class="p">,</span> <span class="w"> </span><span class="n">columns</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="n">String</span><span class="p">,</span> <span class="w"> </span><span class="n">types</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="n">String</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> </pre></div> <p>We will use a <code>tbl_</code> prefix instead of <code>row_</code> prefix for table metadata. But we'll otherwise encode with the same length-prefixed concatentations.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">writeTable</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">Storage</span><span class="p">,</span><span class="w"> </span><span class="n">table</span><span class="o">:</span><span class="w"> </span><span class="n">Table</span><span class="p">)</span><span class="w"> </span><span class="o">?</span><span class="n">Error</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Table name prefix</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">key</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="kt">u8</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span> <span class="w"> </span><span class="n">key</span><span class="p">.</span><span class="n">writer</span><span class="p">().</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;tbl_{s}_&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">table</span><span class="p">.</span><span class="n">name</span><span class="p">})</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">&quot;Could not allocate key for table&quot;</span><span class="p">;</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="kt">u8</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">table</span><span class="p">.</span><span class="n">columns</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">column</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">serializeBytes</span><span class="p">(</span><span class="o">&amp;</span><span class="n">value</span><span class="p">,</span><span class="w"> </span><span class="n">column</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">&quot;Could not allocate for column&quot;</span><span class="p">;</span> <span class="w"> </span><span class="n">serializeBytes</span><span class="p">(</span><span class="o">&amp;</span><span class="n">value</span><span class="p">,</span><span class="w"> </span><span class="n">table</span><span class="p">.</span><span class="n">types</span><span class="p">[</span><span class="n">i</span><span class="p">])</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">&quot;Could not allocate for column type&quot;</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">db</span><span class="p">.</span><span class="n">set</span><span class="p">(</span><span class="n">key</span><span class="p">.</span><span class="n">items</span><span class="p">,</span><span class="w"> </span><span class="n">value</span><span class="p">.</span><span class="n">items</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>And the opposite for decoding.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">getTable</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">Storage</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="o">:</span><span class="w"> </span><span class="n">String</span><span class="p">)</span><span class="w"> </span><span class="n">Result</span><span class="p">(</span><span class="n">Table</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">tableKey</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="kt">u8</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span> <span class="w"> </span><span class="n">tableKey</span><span class="p">.</span><span class="n">writer</span><span class="p">().</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;tbl_{s}_&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">name</span><span class="p">})</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span> <span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;Could not allocate for table prefix&quot;</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">columns</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="n">String</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">types</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="n">String</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">table</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Table</span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">name</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">columns</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">undefined</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">types</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">undefined</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="c1">// First grab table info</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">columnInfo</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">db</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="n">tableKey</span><span class="p">.</span><span class="n">items</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">val</span><span class="o">|</span><span class="w"> </span><span class="n">val</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">not_found</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;No such table&quot;</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">columnOffset</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">columnOffset</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">columnInfo</span><span class="p">.</span><span class="n">len</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">column</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">deserializeBytes</span><span class="p">(</span><span class="n">columnInfo</span><span class="p">[</span><span class="n">columnOffset</span><span class="p">..]);</span> <span class="w"> </span><span class="n">columnOffset</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">column</span><span class="p">.</span><span class="n">offset</span><span class="p">;</span> <span class="w"> </span><span class="n">columns</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">column</span><span class="p">.</span><span class="n">bytes</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span> <span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;Could not allocate for column name.&quot;</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">deserializeBytes</span><span class="p">(</span><span class="n">columnInfo</span><span class="p">[</span><span class="n">columnOffset</span><span class="p">..]);</span> <span class="w"> </span><span class="n">columnOffset</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">kind</span><span class="p">.</span><span class="n">offset</span><span class="p">;</span> <span class="w"> </span><span class="n">types</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">kind</span><span class="p">.</span><span class="n">bytes</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span> <span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;Could not allocate for column kind.&quot;</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">table</span><span class="p">.</span><span class="n">columns</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">columns</span><span class="p">.</span><span class="n">items</span><span class="p">;</span> <span class="w"> </span><span class="n">table</span><span class="p">.</span><span class="n">types</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">types</span><span class="p">.</span><span class="n">items</span><span class="p">;</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">table</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="p">};</span> </pre></div> <p>And that's it for storage! Again, we're building on top of the <a href="https://notes.eatonphil.com/zigrocks.html">RocksDB layer</a> I already wrote about. If you want to see how that works, go for it!</p> <p>If you just want the <code>rocksdb.zig</code> file, grab it from <a href="https://github.com/eatonphil/zigrocks/blob/7831e390f4044bb999507fd6d0e23bb2475756f8/rocksdb.zig">here</a>.</p> <h3 id="execute-(<code>execute.zig</code>,-210-loc)">Execute (<code>execute.zig</code>, 210 LoC)</h3><p>Now that we've got a storage layer and an AST from our parser, we can execute the query on top of the storage!</p> <p>A better implementation might translate the AST to bytecode and implement a bytecode interpreter for expression evaluation. But we'll build a tree-walking interpreter instead.</p> <div class="highlight"><pre><span></span><span class="kr">const</span><span class="w"> </span><span class="n">std</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">&quot;std&quot;</span><span class="p">);</span> <span class="kr">const</span><span class="w"> </span><span class="n">Parser</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">&quot;parse.zig&quot;</span><span class="p">).</span><span class="n">Parser</span><span class="p">;</span> <span class="kr">const</span><span class="w"> </span><span class="n">RocksDB</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">&quot;rocksdb.zig&quot;</span><span class="p">).</span><span class="n">RocksDB</span><span class="p">;</span> <span class="kr">const</span><span class="w"> </span><span class="n">Storage</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">&quot;storage.zig&quot;</span><span class="p">).</span><span class="n">Storage</span><span class="p">;</span> <span class="kr">const</span><span class="w"> </span><span class="n">Result</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">&quot;types.zig&quot;</span><span class="p">).</span><span class="n">Result</span><span class="p">;</span> <span class="kr">const</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">&quot;types.zig&quot;</span><span class="p">).</span><span class="n">String</span><span class="p">;</span> <span class="kr">pub</span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">Executor</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">allocator</span><span class="o">:</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">Allocator</span><span class="p">,</span> <span class="w"> </span><span class="n">storage</span><span class="o">:</span><span class="w"> </span><span class="n">Storage</span><span class="p">,</span> <span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">init</span><span class="p">(</span><span class="n">allocator</span><span class="o">:</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">Allocator</span><span class="p">,</span><span class="w"> </span><span class="n">storage</span><span class="o">:</span><span class="w"> </span><span class="n">Storage</span><span class="p">)</span><span class="w"> </span><span class="n">Executor</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">Executor</span><span class="p">{</span><span class="w"> </span><span class="p">.</span><span class="n">allocator</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">allocator</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">storage</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">storage</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>In general we'll make query responses optional. They can be empty or they can be an array of an array of strings (rows and cells) and an array of strings (column names).</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">QueryResponse</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">fields</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="n">String</span><span class="p">,</span> <span class="w"> </span><span class="c1">// Array of cells (which is an array of serde (which is an array of u8))</span> <span class="w"> </span><span class="n">rows</span><span class="o">:</span><span class="w"> </span><span class="p">[][]</span><span class="n">String</span><span class="p">,</span> <span class="w"> </span><span class="n">empty</span><span class="o">:</span><span class="w"> </span><span class="kt">bool</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">QueryResponseResult</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Result</span><span class="p">(</span><span class="n">QueryResponse</span><span class="p">);</span> </pre></div> <h4 id="expressions">Expressions</h4><p>For execution we start again at the bottom with expressions. There are literals.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">executeExpression</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">Executor</span><span class="p">,</span><span class="w"> </span><span class="n">e</span><span class="o">:</span><span class="w"> </span><span class="n">Parser</span><span class="p">.</span><span class="n">ExpressionAST</span><span class="p">,</span><span class="w"> </span><span class="n">row</span><span class="o">:</span><span class="w"> </span><span class="n">Storage</span><span class="p">.</span><span class="n">Row</span><span class="p">)</span><span class="w"> </span><span class="n">Storage</span><span class="p">.</span><span class="n">Value</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">e</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">literal</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">lit</span><span class="o">|</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">lit</span><span class="p">.</span><span class="n">kind</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">string</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="n">Storage</span><span class="p">.</span><span class="n">Value</span><span class="p">{</span><span class="w"> </span><span class="p">.</span><span class="n">string_value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">lit</span><span class="p">.</span><span class="n">string</span><span class="p">()</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">.</span><span class="n">integer</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="n">Storage</span><span class="p">.</span><span class="n">Value</span><span class="p">.</span><span class="n">fromIntegerString</span><span class="p">(</span><span class="n">lit</span><span class="p">.</span><span class="n">string</span><span class="p">()),</span> <span class="w"> </span><span class="p">.</span><span class="n">identifier</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="n">row</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="n">lit</span><span class="p">.</span><span class="n">string</span><span class="p">()),</span> <span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="k">unreachable</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span> </pre></div> <p>And there are a handful of binary operations.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="p">.</span><span class="n">binary_operation</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">bin_op</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">left</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">executeExpression</span><span class="p">(</span><span class="n">bin_op</span><span class="p">.</span><span class="n">left</span><span class="p">.</span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="n">row</span><span class="p">);</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">right</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">executeExpression</span><span class="p">(</span><span class="n">bin_op</span><span class="p">.</span><span class="n">right</span><span class="p">.</span><span class="o">*</span><span class="p">,</span><span class="w"> </span><span class="n">row</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">bin_op</span><span class="p">.</span><span class="n">operator</span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="p">.</span><span class="n">equal_operator</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Cast dissimilar types to serde</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nb">@enumToInt</span><span class="p">(</span><span class="n">left</span><span class="p">)</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nb">@enumToInt</span><span class="p">(</span><span class="n">right</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">leftBuf</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="kt">u8</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span> <span class="w"> </span><span class="n">left</span><span class="p">.</span><span class="n">asString</span><span class="p">(</span><span class="o">&amp;</span><span class="n">leftBuf</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">unreachable</span><span class="p">;</span> <span class="w"> </span><span class="n">left</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Storage</span><span class="p">.</span><span class="n">Value</span><span class="p">{</span><span class="w"> </span><span class="p">.</span><span class="n">string_value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">leftBuf</span><span class="p">.</span><span class="n">items</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">rightBuf</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="kt">u8</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span> <span class="w"> </span><span class="n">right</span><span class="p">.</span><span class="n">asString</span><span class="p">(</span><span class="o">&amp;</span><span class="n">rightBuf</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">unreachable</span><span class="p">;</span> <span class="w"> </span><span class="n">right</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Storage</span><span class="p">.</span><span class="n">Value</span><span class="p">{</span><span class="w"> </span><span class="p">.</span><span class="n">string_value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rightBuf</span><span class="p">.</span><span class="n">items</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">Storage</span><span class="p">.</span><span class="n">Value</span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">bool_value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">left</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">null_value</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">bool_value</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">v</span><span class="o">|</span><span class="w"> </span><span class="n">v</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">right</span><span class="p">.</span><span class="n">asBool</span><span class="p">(),</span> <span class="w"> </span><span class="p">.</span><span class="n">string_value</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="n">blk</span><span class="o">:</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">leftBuf</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="kt">u8</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span> <span class="w"> </span><span class="n">left</span><span class="p">.</span><span class="n">asString</span><span class="p">(</span><span class="o">&amp;</span><span class="n">leftBuf</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">unreachable</span><span class="p">;</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">rightBuf</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="kt">u8</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span> <span class="w"> </span><span class="n">right</span><span class="p">.</span><span class="n">asString</span><span class="p">(</span><span class="o">&amp;</span><span class="n">rightBuf</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">unreachable</span><span class="p">;</span> <span class="w"> </span><span class="k">break</span><span class="w"> </span><span class="o">:</span><span class="n">blk</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">leftBuf</span><span class="p">.</span><span class="n">items</span><span class="p">,</span><span class="w"> </span><span class="n">rightBuf</span><span class="p">.</span><span class="n">items</span><span class="p">);</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">.</span><span class="n">integer_value</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="n">left</span><span class="p">.</span><span class="n">asInteger</span><span class="p">()</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">right</span><span class="p">.</span><span class="n">asInteger</span><span class="p">(),</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">bin_op</span><span class="p">.</span><span class="n">operator</span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="p">.</span><span class="n">concat_operator</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">copy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="kt">u8</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span> <span class="w"> </span><span class="n">left</span><span class="p">.</span><span class="n">asString</span><span class="p">(</span><span class="o">&amp;</span><span class="n">copy</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">unreachable</span><span class="p">;</span> <span class="w"> </span><span class="n">right</span><span class="p">.</span><span class="n">asString</span><span class="p">(</span><span class="o">&amp;</span><span class="n">copy</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">unreachable</span><span class="p">;</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">Storage</span><span class="p">.</span><span class="n">Value</span><span class="p">{</span><span class="w"> </span><span class="p">.</span><span class="n">string_value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">copy</span><span class="p">.</span><span class="n">items</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">bin_op</span><span class="p">.</span><span class="n">operator</span><span class="p">.</span><span class="n">kind</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">lt_operator</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">left</span><span class="p">.</span><span class="n">asInteger</span><span class="p">()</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">right</span><span class="p">.</span><span class="n">asInteger</span><span class="p">())</span><span class="w"> </span><span class="n">Storage</span><span class="p">.</span><span class="n">Value</span><span class="p">.</span><span class="n">TRUE</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="n">Storage</span><span class="p">.</span><span class="n">Value</span><span class="p">.</span><span class="n">FALSE</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">plus_operator</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="n">Storage</span><span class="p">.</span><span class="n">Value</span><span class="p">{</span><span class="w"> </span><span class="p">.</span><span class="n">integer_value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">left</span><span class="p">.</span><span class="n">asInteger</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">right</span><span class="p">.</span><span class="n">asInteger</span><span class="p">()</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="n">Storage</span><span class="p">.</span><span class="n">Value</span><span class="p">.</span><span class="n">NULL</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> </pre></div> <h4 id="<code>select</code>"><code>SELECT</code></h4><p>To execute a <code>SELECT</code> query we first validate the requested table and requested fields.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">executeSelect</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">Executor</span><span class="p">,</span><span class="w"> </span><span class="n">s</span><span class="o">:</span><span class="w"> </span><span class="n">Parser</span><span class="p">.</span><span class="n">SelectAST</span><span class="p">)</span><span class="w"> </span><span class="n">QueryResponseResult</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">storage</span><span class="p">.</span><span class="n">getTable</span><span class="p">(</span><span class="n">s</span><span class="p">.</span><span class="n">from</span><span class="p">.</span><span class="n">string</span><span class="p">()))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Now validate and store requested fields</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">requestedFields</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="n">String</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">s</span><span class="p">.</span><span class="n">columns</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">requestedColumn</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">fieldName</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">requestedColumn</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">literal</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">lit</span><span class="o">|</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">lit</span><span class="p">.</span><span class="n">kind</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">identifier</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="n">lit</span><span class="p">.</span><span class="n">string</span><span class="p">(),</span> <span class="w"> </span><span class="c1">// TODO: give reasonable names</span> <span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="s">&quot;unknown&quot;</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="c1">// TODO: give reasonable names</span> <span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="s">&quot;unknown&quot;</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="n">requestedFields</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">fieldName</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span> <span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;Could not allocate for requested field.&quot;</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Then grab an iterator for rows in the table.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">rows</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">([]</span><span class="n">String</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">response</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">QueryResponse</span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">fields</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">requestedFields</span><span class="p">.</span><span class="n">items</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">rows</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">undefined</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">empty</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">false</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">iter</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">storage</span><span class="p">.</span><span class="n">getRowIter</span><span class="p">(</span><span class="n">s</span><span class="p">.</span><span class="n">from</span><span class="p">.</span><span class="n">string</span><span class="p">()))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">it</span><span class="o">|</span><span class="w"> </span><span class="n">it</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">iter</span><span class="p">.</span><span class="n">close</span><span class="p">();</span> </pre></div> <p>And finally we iterate through all rows and add rows to the response if there is no <code>WHERE</code> condition or if we evaluate the <code>WHERE</code> condition successfully.</p> <p>When we add rows to the response, we need to actually evaluate the expression for each column in the <code>SELECT</code> AST.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">iter</span><span class="p">.</span><span class="n">next</span><span class="p">())</span><span class="w"> </span><span class="o">|</span><span class="n">row</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">add</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">false</span><span class="p">;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">s</span><span class="p">.</span><span class="n">where</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">where</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">executeExpression</span><span class="p">(</span><span class="n">where</span><span class="p">,</span><span class="w"> </span><span class="n">row</span><span class="p">).</span><span class="n">asBool</span><span class="p">())</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">add</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">true</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">add</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">true</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">add</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">requested</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="n">String</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">s</span><span class="p">.</span><span class="n">columns</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">exp</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">executeExpression</span><span class="p">(</span><span class="n">exp</span><span class="p">,</span><span class="w"> </span><span class="n">row</span><span class="p">);</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">valBuf</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="kt">u8</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span> <span class="w"> </span><span class="n">val</span><span class="p">.</span><span class="n">asString</span><span class="p">(</span><span class="o">&amp;</span><span class="n">valBuf</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">unreachable</span><span class="p">;</span> <span class="w"> </span><span class="n">requested</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">valBuf</span><span class="p">.</span><span class="n">items</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span> <span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;Could not allocate for requested cell&quot;</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">rows</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">requested</span><span class="p">.</span><span class="n">items</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span> <span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;Could not allocate for row&quot;</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">response</span><span class="p">.</span><span class="n">rows</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rows</span><span class="p">.</span><span class="n">items</span><span class="p">;</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">response</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> </pre></div> <h4 id="<code>insert-into</code>"><code>INSERT INTO</code></h4><p>Inserting is pretty simple, we just evaluate the <code>VALUES</code> passed and write them to storage.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">executeInsert</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">Executor</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="o">:</span><span class="w"> </span><span class="n">Parser</span><span class="p">.</span><span class="n">InsertAST</span><span class="p">)</span><span class="w"> </span><span class="n">QueryResponseResult</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">emptyRow</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Storage</span><span class="p">.</span><span class="n">Row</span><span class="p">.</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">,</span><span class="w"> </span><span class="kc">undefined</span><span class="p">);</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">row</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Storage</span><span class="p">.</span><span class="n">Row</span><span class="p">.</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">,</span><span class="w"> </span><span class="kc">undefined</span><span class="p">);</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">values</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">v</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">exp</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">executeExpression</span><span class="p">(</span><span class="n">v</span><span class="p">,</span><span class="w"> </span><span class="n">emptyRow</span><span class="p">);</span> <span class="w"> </span><span class="n">row</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">exp</span><span class="p">)</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;Could not allocate for cell&quot;</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">storage</span><span class="p">.</span><span class="n">writeRow</span><span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">table</span><span class="p">.</span><span class="n">string</span><span class="p">(),</span><span class="w"> </span><span class="n">row</span><span class="p">))</span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span> <span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">fields</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">undefined</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">rows</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">undefined</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">empty</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">true</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> </pre></div> <h4 id="<code>create-table</code>"><code>CREATE TABLE</code></h4><p>Similarly to <code>INSERT INTO</code>, but without any expression evaluation, we map the <code>CreateTableAST</code> to <code>Storage</code> entities and write them to storage.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">executeCreateTable</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">Executor</span><span class="p">,</span><span class="w"> </span><span class="n">c</span><span class="o">:</span><span class="w"> </span><span class="n">Parser</span><span class="p">.</span><span class="n">CreateTableAST</span><span class="p">)</span><span class="w"> </span><span class="n">QueryResponseResult</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">columns</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="n">String</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">types</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="n">String</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">allocator</span><span class="p">);</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">c</span><span class="p">.</span><span class="n">columns</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">column</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">columns</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">column</span><span class="p">.</span><span class="n">name</span><span class="p">.</span><span class="n">string</span><span class="p">())</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span> <span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;Could not allocate for column name&quot;</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="n">types</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">column</span><span class="p">.</span><span class="n">kind</span><span class="p">.</span><span class="n">string</span><span class="p">())</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span> <span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;Could not allocate for column kind&quot;</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">table</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Storage</span><span class="p">.</span><span class="n">Table</span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">table</span><span class="p">.</span><span class="n">string</span><span class="p">(),</span> <span class="w"> </span><span class="p">.</span><span class="n">columns</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">columns</span><span class="p">.</span><span class="n">items</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">types</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">types</span><span class="p">.</span><span class="n">items</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">storage</span><span class="p">.</span><span class="n">writeTable</span><span class="p">(</span><span class="n">table</span><span class="p">))</span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span> <span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">fields</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">undefined</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">rows</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">undefined</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">empty</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">true</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>For both <code>CREATE TABLE</code> and <code>INSERT INTO</code> there is more validation we could do. Exercise for the reader and whatnot. :)</p> <h4 id="<code>execute</code>"><code>execute</code></h4><p>Finally we can switch on the <code>AST</code> and call the appropriate execution function.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">execute</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">Executor</span><span class="p">,</span><span class="w"> </span><span class="n">ast</span><span class="o">:</span><span class="w"> </span><span class="n">Parser</span><span class="p">.</span><span class="n">AST</span><span class="p">)</span><span class="w"> </span><span class="n">QueryResponseResult</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">ast</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">select</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">select</span><span class="o">|</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">executeSelect</span><span class="p">(</span><span class="n">select</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">val</span><span class="o">|</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">.</span><span class="n">insert</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">insert</span><span class="o">|</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">executeInsert</span><span class="p">(</span><span class="n">insert</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">val</span><span class="o">|</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">.</span><span class="n">create_table</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">createTable</span><span class="o">|</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">executeCreateTable</span><span class="p">(</span><span class="n">createTable</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">val</span><span class="o">|</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="p">};</span> </pre></div> <p>And now we're ready to put it all together in <code>main</code>!</p> <h3 id="<code>main</code>-(<code>main.zig</code>,-144-loc)"><code>main</code> (<code>main.zig</code>, 144 LoC)</h3><p>First we set up our arena allocator.</p> <div class="highlight"><pre><span></span><span class="kr">const</span><span class="w"> </span><span class="n">std</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">&quot;std&quot;</span><span class="p">);</span> <span class="kr">const</span><span class="w"> </span><span class="n">RocksDB</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">&quot;rocksdb.zig&quot;</span><span class="p">).</span><span class="n">RocksDB</span><span class="p">;</span> <span class="kr">const</span><span class="w"> </span><span class="n">lex</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">&quot;lex.zig&quot;</span><span class="p">);</span> <span class="kr">const</span><span class="w"> </span><span class="n">parse</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">&quot;parse.zig&quot;</span><span class="p">);</span> <span class="kr">const</span><span class="w"> </span><span class="n">execute</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">&quot;execute.zig&quot;</span><span class="p">);</span> <span class="kr">const</span><span class="w"> </span><span class="n">Storage</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">&quot;storage.zig&quot;</span><span class="p">).</span><span class="n">Storage</span><span class="p">;</span> <span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">main</span><span class="p">()</span><span class="w"> </span><span class="o">!</span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">arena</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">heap</span><span class="p">.</span><span class="n">ArenaAllocator</span><span class="p">.</span><span class="n">init</span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">heap</span><span class="p">.</span><span class="n">page_allocator</span><span class="p">);</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">arena</span><span class="p">.</span><span class="n">deinit</span><span class="p">();</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">allocator</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">arena</span><span class="p">.</span><span class="n">allocator</span><span class="p">();</span> </pre></div> <p>Then we parse CLI arguments. Importantly we need to grab a location on disk for RocksDB to store data. And we need a query to execute.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">debugTokens</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">false</span><span class="p">;</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">debugAST</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">false</span><span class="p">;</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">args</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">process</span><span class="p">.</span><span class="n">args</span><span class="p">();</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">scriptArg</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">databaseArg</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">i</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">args</span><span class="p">.</span><span class="n">next</span><span class="p">())</span><span class="w"> </span><span class="o">|</span><span class="n">arg</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">arg</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;--debug-tokens&quot;</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">debugTokens</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">true</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">arg</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;--debug-ast&quot;</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">debugAST</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">true</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">arg</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;--database&quot;</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">databaseArg</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">args</span><span class="p">.</span><span class="n">next</span><span class="p">();</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">arg</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;--script&quot;</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">scriptArg</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">args</span><span class="p">.</span><span class="n">next</span><span class="p">();</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">databaseArg</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;--database is a required flag. Should be a directory for data.</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">scriptArg</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;--script is a required flag. Should be a file containing SQL.</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Next we read the file passed for the query.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">file</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">fs</span><span class="p">.</span><span class="n">cwd</span><span class="p">().</span><span class="n">openFileZ</span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">os</span><span class="p">.</span><span class="n">argv</span><span class="p">[</span><span class="n">scriptArg</span><span class="p">],</span><span class="w"> </span><span class="p">.{});</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">file</span><span class="p">.</span><span class="n">close</span><span class="p">();</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">file_size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">file</span><span class="p">.</span><span class="n">getEndPos</span><span class="p">();</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">prog</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">allocator</span><span class="p">.</span><span class="n">alloc</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">file_size</span><span class="p">);</span> <span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="n">file</span><span class="p">.</span><span class="n">read</span><span class="p">(</span><span class="n">prog</span><span class="p">);</span> </pre></div> <p>And pass the query to the lexer.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">tokens</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">ArrayList</span><span class="p">(</span><span class="n">lex</span><span class="p">.</span><span class="n">Token</span><span class="p">).</span><span class="n">init</span><span class="p">(</span><span class="n">allocator</span><span class="p">);</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">lexErr</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">lex</span><span class="p">.</span><span class="n">lex</span><span class="p">(</span><span class="n">prog</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="n">tokens</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">lexErr</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;Failed to lex: {s}&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">err</span><span class="p">});</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">debugTokens</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">tokens</span><span class="p">.</span><span class="n">items</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">token</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;Token: {s}</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">token</span><span class="p">.</span><span class="n">string</span><span class="p">()});</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">tokens</span><span class="p">.</span><span class="n">items</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;Program is empty&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Pass the tokens to the parser.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">parser</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">parse</span><span class="p">.</span><span class="n">Parser</span><span class="p">.</span><span class="n">init</span><span class="p">(</span><span class="n">allocator</span><span class="p">);</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">ast</span><span class="o">:</span><span class="w"> </span><span class="n">parse</span><span class="p">.</span><span class="n">Parser</span><span class="p">.</span><span class="n">AST</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">undefined</span><span class="p">;</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">parser</span><span class="p">.</span><span class="n">parse</span><span class="p">(</span><span class="n">tokens</span><span class="p">.</span><span class="n">items</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;Failed to parse: {s}&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">err</span><span class="p">});</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">val</span><span class="o">|</span><span class="w"> </span><span class="n">ast</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">val</span><span class="p">,</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">debugAST</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">ast</span><span class="p">.</span><span class="n">print</span><span class="p">();</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Initialize storage.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">db</span><span class="o">:</span><span class="w"> </span><span class="n">RocksDB</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">undefined</span><span class="p">;</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">dataDirectory</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">span</span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">os</span><span class="p">.</span><span class="n">argv</span><span class="p">[</span><span class="n">databaseArg</span><span class="p">]);</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">RocksDB</span><span class="p">.</span><span class="n">open</span><span class="p">(</span><span class="n">allocator</span><span class="p">,</span><span class="w"> </span><span class="n">dataDirectory</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;Failed to open database: {s}&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">err</span><span class="p">});</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">val</span><span class="o">|</span><span class="w"> </span><span class="n">db</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">val</span><span class="p">,</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">db</span><span class="p">.</span><span class="n">close</span><span class="p">();</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">storage</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Storage</span><span class="p">.</span><span class="n">init</span><span class="p">(</span><span class="n">allocator</span><span class="p">,</span><span class="w"> </span><span class="n">db</span><span class="p">);</span> </pre></div> <p>And execute and print results. :)</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">executor</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">execute</span><span class="p">.</span><span class="n">Executor</span><span class="p">.</span><span class="n">init</span><span class="p">(</span><span class="n">allocator</span><span class="p">,</span><span class="w"> </span><span class="n">storage</span><span class="p">);</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">executor</span><span class="p">.</span><span class="n">execute</span><span class="p">(</span><span class="n">ast</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;Failed to execute: {s}&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">err</span><span class="p">});</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="o">|</span><span class="n">val</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">val</span><span class="p">.</span><span class="n">rows</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;ok</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;| &quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">val</span><span class="p">.</span><span class="n">fields</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">field</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;{s}</span><span class="se">\t\t</span><span class="s">|&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">field</span><span class="p">});</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;+ &quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">val</span><span class="p">.</span><span class="n">fields</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">field</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">fieldLen</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">field</span><span class="p">.</span><span class="n">len</span><span class="p">;</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">fieldLen</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;=&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span> <span class="w"> </span><span class="n">fieldLen</span><span class="w"> </span><span class="o">-=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;</span><span class="se">\t\t</span><span class="s">+&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">val</span><span class="p">.</span><span class="n">rows</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">row</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;| &quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">row</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">cell</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;{s}</span><span class="se">\t\t</span><span class="s">|&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">cell</span><span class="p">});</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <h3 id="build.zig">build.zig</h3><p>Finally, finally, tie it all together with <code>build.zig</code>.</p> <div class="highlight"><pre><span></span><span class="kr">const</span><span class="w"> </span><span class="n">version</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">&quot;builtin&quot;</span><span class="p">).</span><span class="n">zig_version</span><span class="p">;</span> <span class="kr">const</span><span class="w"> </span><span class="n">std</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">&quot;std&quot;</span><span class="p">);</span> <span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">build</span><span class="p">(</span><span class="n">b</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">std</span><span class="p">.</span><span class="n">build</span><span class="p">.</span><span class="n">Builder</span><span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">exe</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">b</span><span class="p">.</span><span class="n">addExecutable</span><span class="p">(</span><span class="s">&quot;main&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;main.zig&quot;</span><span class="p">);</span> <span class="w"> </span><span class="n">exe</span><span class="p">.</span><span class="n">linkLibC</span><span class="p">();</span> <span class="w"> </span><span class="n">exe</span><span class="p">.</span><span class="n">linkSystemLibraryName</span><span class="p">(</span><span class="s">&quot;rocksdb&quot;</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nb">@hasDecl</span><span class="p">(</span><span class="nb">@TypeOf</span><span class="p">(</span><span class="n">exe</span><span class="p">.</span><span class="o">*</span><span class="p">),</span><span class="w"> </span><span class="s">&quot;addLibraryPath&quot;</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">exe</span><span class="p">.</span><span class="n">addLibraryPath</span><span class="p">(</span><span class="s">&quot;./rocksdb&quot;</span><span class="p">);</span> <span class="w"> </span><span class="n">exe</span><span class="p">.</span><span class="n">addIncludePath</span><span class="p">(</span><span class="s">&quot;./rocksdb/include&quot;</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">exe</span><span class="p">.</span><span class="n">addLibPath</span><span class="p">(</span><span class="s">&quot;./rocksdb&quot;</span><span class="p">);</span> <span class="w"> </span><span class="n">exe</span><span class="p">.</span><span class="n">addIncludeDir</span><span class="p">(</span><span class="s">&quot;./rocksdb/include&quot;</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">exe</span><span class="p">.</span><span class="n">setOutputDir</span><span class="p">(</span><span class="s">&quot;.&quot;</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">exe</span><span class="p">.</span><span class="n">target</span><span class="p">.</span><span class="n">isDarwin</span><span class="p">())</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">exe</span><span class="p">.</span><span class="n">addRPath</span><span class="p">(</span><span class="s">&quot;.&quot;</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">exe</span><span class="p">.</span><span class="n">install</span><span class="p">();</span> <span class="p">}</span> </pre></div> <p>Grab RocksDB, build it, and build our CLI.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https://github.com/facebook/rocksdb $<span class="w"> </span><span class="o">(</span><span class="w"> </span><span class="nb">cd</span><span class="w"> </span>rocksdb<span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span>make<span class="w"> </span>shared_lib<span class="w"> </span>-j8<span class="w"> </span><span class="o">)</span> <span class="c1"># ONLY IF YOU ARE ON A MAC</span> $<span class="w"> </span>cp<span class="w"> </span>rocksdb/*.dylib<span class="w"> </span>.<span class="w"> </span><span class="c1"># ONLY IF YOU ARE ON A MAC</span> <span class="c1"># DONE ONLY IF YOU ARE ON A MAC</span> $<span class="w"> </span>zig<span class="w"> </span>build </pre></div> <p>And give it a go. :)</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>./main<span class="w"> </span>--database<span class="w"> </span>data<span class="w"> </span>--script<span class="w"> </span>&lt;<span class="o">(</span><span class="nb">echo</span><span class="w"> </span><span class="s2">&quot;CREATE TABLE y (year int, age int, name text)&quot;</span><span class="o">)</span> <span class="nb">echo</span><span class="w"> </span><span class="s2">&quot;CREATE TABLE y (year int, age int, name text)&quot;</span> ok $<span class="w"> </span>./main<span class="w"> </span>--database<span class="w"> </span>data<span class="w"> </span>--script<span class="w"> </span>&lt;<span class="o">(</span><span class="nb">echo</span><span class="w"> </span><span class="s2">&quot;INSERT INTO y VALUES (2010, 38, &#39;Gary&#39;)&quot;</span><span class="o">)</span> <span class="nb">echo</span><span class="w"> </span><span class="s2">&quot;INSERT INTO y VALUES (2010, 38, &#39;Gary&#39;)&quot;</span> ok $<span class="w"> </span>./main<span class="w"> </span>--database<span class="w"> </span>data<span class="w"> </span>--script<span class="w"> </span>&lt;<span class="o">(</span><span class="nb">echo</span><span class="w"> </span><span class="s2">&quot;INSERT INTO y VALUES (2021, 92, &#39;Teej&#39;)&quot;</span><span class="o">)</span> <span class="nb">echo</span><span class="w"> </span><span class="s2">&quot;INSERT INTO y VALUES (2021, 92, &#39;Teej&#39;)&quot;</span> ok $<span class="w"> </span>./main<span class="w"> </span>--database<span class="w"> </span>data<span class="w"> </span>--script<span class="w"> </span>&lt;<span class="o">(</span><span class="nb">echo</span><span class="w"> </span><span class="s2">&quot;INSERT INTO y VALUES (1994, 18, &#39;Mel&#39;)&quot;</span><span class="o">)</span> <span class="nb">echo</span><span class="w"> </span><span class="s2">&quot;INSERT INTO y VALUES (1994, 18, &#39;Mel&#39;)&quot;</span> ok <span class="c1"># Basic query</span> $<span class="w"> </span>./main<span class="w"> </span>--database<span class="w"> </span>data<span class="w"> </span>--script<span class="w"> </span>&lt;<span class="o">(</span><span class="nb">echo</span><span class="w"> </span><span class="s2">&quot;SELECT name, age, year FROM y&quot;</span><span class="o">)</span> <span class="nb">echo</span><span class="w"> </span><span class="s2">&quot;SELECT name, age, year FROM y&quot;</span> <span class="p">|</span><span class="w"> </span>name<span class="w"> </span><span class="p">|</span>age<span class="w"> </span><span class="p">|</span>year<span class="w"> </span><span class="p">|</span> +<span class="w"> </span><span class="o">====</span><span class="w"> </span>+<span class="o">===</span><span class="w"> </span>+<span class="o">====</span><span class="w"> </span>+ <span class="p">|</span><span class="w"> </span>Mel<span class="w"> </span><span class="p">|</span><span class="m">18</span><span class="w"> </span><span class="p">|</span><span class="m">1994</span><span class="w"> </span><span class="p">|</span> <span class="p">|</span><span class="w"> </span>Gary<span class="w"> </span><span class="p">|</span><span class="m">38</span><span class="w"> </span><span class="p">|</span><span class="m">2010</span><span class="w"> </span><span class="p">|</span> <span class="p">|</span><span class="w"> </span>Teej<span class="w"> </span><span class="p">|</span><span class="m">92</span><span class="w"> </span><span class="p">|</span><span class="m">2021</span><span class="w"> </span><span class="p">|</span> <span class="c1"># With WHERE</span> $<span class="w"> </span>./main<span class="w"> </span>--database<span class="w"> </span>data<span class="w"> </span>--script<span class="w"> </span>&lt;<span class="o">(</span><span class="nb">echo</span><span class="w"> </span><span class="s2">&quot;SELECT name, year, age FROM y WHERE age &lt; 40&quot;</span><span class="o">)</span> <span class="nb">echo</span><span class="w"> </span><span class="s2">&quot;SELECT name, year, age FROM y WHERE age &lt; 40&quot;</span> <span class="p">|</span><span class="w"> </span>name<span class="w"> </span><span class="p">|</span>year<span class="w"> </span><span class="p">|</span>age<span class="w"> </span><span class="p">|</span> +<span class="w"> </span><span class="o">====</span><span class="w"> </span>+<span class="o">====</span><span class="w"> </span>+<span class="o">===</span><span class="w"> </span>+ <span class="p">|</span><span class="w"> </span>Mel<span class="w"> </span><span class="p">|</span><span class="m">1994</span><span class="w"> </span><span class="p">|</span><span class="m">18</span><span class="w"> </span><span class="p">|</span> <span class="p">|</span><span class="w"> </span>Gary<span class="w"> </span><span class="p">|</span><span class="m">2010</span><span class="w"> </span><span class="p">|</span><span class="m">38</span><span class="w"> </span><span class="p">|</span> <span class="c1"># With operations</span> $<span class="w"> </span>./main<span class="w"> </span>--database<span class="w"> </span>data<span class="w"> </span>--script<span class="w"> </span>&lt;<span class="o">(</span><span class="nb">echo</span><span class="w"> </span><span class="s2">&quot;SELECT &#39;Name: &#39; || name, year + 30, age FROM y WHERE age &lt; 40&quot;</span><span class="o">)</span> <span class="nb">echo</span><span class="w"> </span><span class="s2">&quot;SELECT &#39;Name: &#39; || name, year + 30, age FROM y WHERE age &lt; 40&quot;</span> <span class="p">|</span><span class="w"> </span>unknown<span class="w"> </span><span class="p">|</span>unknown<span class="w"> </span><span class="p">|</span>age<span class="w"> </span><span class="p">|</span> +<span class="w"> </span><span class="o">=======</span><span class="w"> </span>+<span class="o">=======</span><span class="w"> </span>+<span class="o">===</span><span class="w"> </span>+ <span class="p">|</span><span class="w"> </span>Name:<span class="w"> </span>Mel<span class="w"> </span><span class="p">|</span><span class="m">2024</span><span class="w"> </span><span class="p">|</span><span class="m">18</span><span class="w"> </span><span class="p">|</span> <span class="p">|</span><span class="w"> </span>Name:<span class="w"> </span>Gary<span class="w"> </span><span class="p">|</span><span class="m">2040</span><span class="w"> </span><span class="p">|</span><span class="m">38</span><span class="w"> </span><span class="p">|</span> </pre></div> <h3 id="from-here">From Here</h3><p>As mentioned, this project is a vast simplification and there are plenty of bugs and subpar design choices. But hopefully it helps to make database development feel a little less intimidating!</p> <p>If you liked this, here are some other things you might want to check out!</p> <ul> <li><a href="https://www.goodreads.com/book/show/23463279-designing-data-intensive-applications">Designing Database Intensive Applications</a></li> <li><a href="https://www.goodreads.com/en/book/show/44647144-database-internals">Database Internals: A Deep Dive Into How Distributed Data Systems Work</a></li> <li><a href="https://reddit.com/r/databasedevelopment">r/databasedevelopment</a></li> <li><a href="https://eatonphil.com/discord.html">The #dbs channel on a software internals/hacking Discord I run</a></li> <li><a href="https://github.com/gosql">gosql</a></li> </ul> <p>And of course, other posts on this blog. :)</p> <p>Lastly, a few resources that helped me out while hacking on this:</p> <ul> <li><a href="https://ziglang.org/documentation/master/">Zig Documentation</a></li> <li>Browsing the source code (and tests!!) of standard library data structures</li> <li><a href="https://discord.gg/gxsFFjE">Zig Programming Language Discord's #zig-help channel</a><ul> <li>Friendly and helpful crowd :)</li> </ul> </li> </ul> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Spent a month hacking on it and happy to finally have this post out.<br><br>Let&#39;s build a basic SQL database in Zig on top of RocksDB. 😃<a href="https://t.co/fkSnaEKsya">https://t.co/fkSnaEKsya</a> <a href="https://t.co/adfpMvvvOn">pic.twitter.com/adfpMvvvOn</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1591974393130934273?ref_src=twsrc%5Etfw">November 14, 2022</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/zigrocks-sql.htmlSun, 13 Nov 2022 00:00:00 +0000A minimal RocksDB example with Zighttp://notes.eatonphil.com/zigrocks.html<p>I mostly programmed in Go the last few years. So every time I wanted an embedded key-value database, I reached for Cockroach's <a href="https://github.com/cockroachdb/pebble">Pebble</a>.</p> <p>Pebble is great for Go programming but Go does not embed well into other languages. Pebble was inspired by <a href="https://github.com/facebook/rocksdb">RocksDB</a> (and its predecessor, <a href="https://github.com/google/leveldb">LevelDB</a>). Both were written in C++ which can more easily be embedded into any language with a C foreign function interface. Pebble also has some interesting limitations that RocksDB does not, <a href="https://github.com/facebook/rocksdb/wiki/Transactions">transactions</a> for example.</p> <p>So I've been wanting to get familiar with RocksDB. And I've been learning Zig, so I set out to write a simple Zig program that embeds RocksDB. (If you see weird things in my Zig code and have suggestions, <a href="mailto:[email protected]">send me a note</a>!)</p> <p>This post is going to be a mix of RocksDB explanations and Zig explanations. By the end we'll have a simple CLI over a durable store that is able to set keys, get keys, and list all key-value pairs (optionally filtered on a key prefix).</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>./kv<span class="w"> </span><span class="nb">set</span><span class="w"> </span>x<span class="w"> </span><span class="m">1</span> $<span class="w"> </span>./kv<span class="w"> </span>get<span class="w"> </span>x <span class="m">1</span> $<span class="w"> </span>./kv<span class="w"> </span><span class="nb">set</span><span class="w"> </span>y<span class="w"> </span><span class="m">22</span> $<span class="w"> </span>./kv<span class="w"> </span>list<span class="w"> </span>x <span class="nv">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span> $<span class="w"> </span>./kv<span class="w"> </span>list<span class="w"> </span>y <span class="nv">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">22</span> $<span class="w"> </span>./kv<span class="w"> </span>list <span class="nv">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span> <span class="nv">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">22</span> </pre></div> <p>Basic stuff!</p> <p>You can find the code for this post in the <a href="https://github.com/eatonphil/zigrocks">rocksdb.zig file on Github</a>. To simplify things, this code is only going to work on Linux. And it will require Zig 0.10.x.</p> <h3 id="rocksdb">RocksDB</h3><p>RocksDB is written in C++. But most languages cannot interface with C++. (Zig cannot either, as far as I understand). So most C++ libraries expose a C API that is easier for other programming languages to interact with. RocksDB does this. Great!</p> <p>Now RocksDB's <a href="https://github.com/facebook/rocksdb/wiki">C++ documentation</a> is phenomenal, especially among C++ libraries. But if there is documentation for the C API, I couldn't find it. Instead you must trawl through the <a href="https://github.com/facebook/rocksdb/blob/main/include/rocksdb/c.h">C header file</a>, the <a href="https://github.com/facebook/rocksdb/blob/main/db/c.cc">C wrapper implementation</a>, and the <a href="https://github.com/facebook/rocksdb/blob/main/db/c_test.c">C tests</a>.</p> <p>There was also a <a href="https://gist.github.com/nitingupta910/4640638be7e7ad39c41e">great gist showing a minimal RocksDB C example</a>. But it didn't cover the iterator API for fetching a range of keys with a prefix. But with the C tests file I was able to figure it out, I think.</p> <p>Let's dig in!</p> <h3 id="creating,-opening-and-closing-a-rocksdb-database">Creating, opening and closing a RocksDB database</h3><p>First we need to import the C header so that Zig can compile-time verify the foreign functions we call. We'll also import the standard library that we'll use later.</p> <p>Aside from <code>build.zig</code> below, all code should be in <code>main.zig</code>.</p> <div class="highlight"><pre><span></span><span class="kr">const</span><span class="w"> </span><span class="n">std</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">&quot;std&quot;</span><span class="p">);</span> <span class="kr">const</span><span class="w"> </span><span class="n">rdb</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@cImport</span><span class="p">(</span><span class="nb">@cInclude</span><span class="p">(</span><span class="s">&quot;rocksdb/c.h&quot;</span><span class="p">));</span> </pre></div> <p class="note"> Don't read anything into the `@` other than that this is a compiler builtin. It's used for imports, casting, and other metaprogramming. </p><p>Now we can build our wrapper. It will be a Zig struct that contains a pointer to the RocksDB instance.</p> <div class="highlight"><pre><span></span><span class="kr">const</span><span class="w"> </span><span class="n">RocksDB</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">db</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">rdb</span><span class="p">.</span><span class="n">rocksdb_t</span><span class="p">,</span> </pre></div> <p>To open a database we'll call <code>rocksdb_open()</code> with a directory name for RocksDB to store data. And we'll tell RocksDB to create the database if it doesn't already exist.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">open</span><span class="p">(</span><span class="n">dir</span><span class="o">:</span><span class="w"> </span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">)</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">val</span><span class="o">:</span><span class="w"> </span><span class="o">?</span><span class="n">RocksDB</span><span class="p">,</span><span class="w"> </span><span class="n">err</span><span class="o">:</span><span class="w"> </span><span class="o">?</span><span class="p">[]</span><span class="kt">u8</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">options</span><span class="o">:</span><span class="w"> </span><span class="o">?*</span><span class="n">rdb</span><span class="p">.</span><span class="n">rocksdb_options_t</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rdb</span><span class="p">.</span><span class="n">rocksdb_options_create</span><span class="p">();</span> <span class="w"> </span><span class="n">rdb</span><span class="p">.</span><span class="n">rocksdb_options_set_create_if_missing</span><span class="p">(</span><span class="n">options</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">err</span><span class="o">:</span><span class="w"> </span><span class="o">?</span><span class="p">[</span><span class="o">*:</span><span class="mi">0</span><span class="p">]</span><span class="kt">u8</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">null</span><span class="p">;</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">db</span><span class="o">:</span><span class="w"> </span><span class="o">?*</span><span class="n">rdb</span><span class="p">.</span><span class="n">rocksdb_t</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rdb</span><span class="p">.</span><span class="n">rocksdb_open</span><span class="p">(</span><span class="n">options</span><span class="p">,</span><span class="w"> </span><span class="n">dir</span><span class="p">.</span><span class="n">ptr</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="n">err</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">null</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">null</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">span</span><span class="p">(</span><span class="n">err</span><span class="p">)</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">RocksDB</span><span class="p">{</span><span class="w"> </span><span class="p">.</span><span class="n">db</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">db</span><span class="p">.</span><span class="o">?</span><span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">null</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Finally, we close with <code>rocksdb_close()</code>:</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">close</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">RocksDB</span><span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">rdb</span><span class="p">.</span><span class="n">rocksdb_close</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">db</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>The RocksDB aspect of this is easy. But there's a bunch of Zig-specific details I should (try to) explain.</p> <h4 id="return-types">Return types</h4><p>Zig has a cool <a href="https://ziglang.org/documentation/master/#Errors"><code>error</code></a> type. <code>try</code>/<code>catch</code> in Zig work only with this <code>error</code> type and subsets of it you can create. <code>error</code> is an enum. But Zig <code>error</code>s are not ML-style tagged unions (yet?). That is, you cannot both return an error and some dynamic information about the error. So the usefulness of <code>error</code> is limited. It mostly only works if the errors are a finite set without dynamic aspects.</p> <p>Zig also doesn't have multiple return values. But it does have optional types (denoted with <code>?</code>) and it has anonymous structs.</p> <p>So we can do a slightly less safe, but more informational, error type by returning a struct with an optional success value and an optional error.</p> <p>That's how we get the return type <code>struct { val: ?RocksDB, err: ?[]u8 }</code>.</p> <p>This is not very different from Go, certainly no less safe, and I'm probably biased to use this as a Go programmer.</p> <p class="note"> Felix Queißner points out to me that there are tagged unions in Zig that would be more safe here. Instead of <code>struct { val: ?RocksDB, err: ?[]u8 }</code> I could do <code>union(enum) { val: RocksDB, err: []u8 }</code>. When I get a chance to play with that syntax I'll modify this post. </p><h4 id="optional-pointers">Optional pointers</h4><p>The next thing you may notice is <code>?*rdb.rocksdb_options_t</code> and <code>?*rdb.rocksdb_t</code>. This is to work with Zig's type system. Zig expects that pointers are not null. By adding <code>?</code> we are telling Zig that this value can be null. That way the Zig type system will force us to handle the null condition if we try to access fields on the value.</p> <p>In the options case, it doesn't really matter if the result is <code>null</code> or not. In the database case, we handle null-ness it by checking the error value <code>if (err) |errStr|</code>. If this condition is <em>not</em> met, we know the database is not null. So we use <code>db.?</code> to assert and return a value that, in the type system, is not null.</p> <h4 id="zig-strings,-c-strings">Zig strings, C strings</h4><p>Another thing you may notice is <code>var err: ?[*:0]u8 = null;</code>. Zig strings are expressed as byte arrays or byte slices. <code>[]u8</code> and <code>[]const u8</code> are slices that keep track of the number of items. <code>[*:0]u8</code> is <em>not</em> a byte slice. It has no length and is only null-delimited. To go from the null-delimited array that the C API returns to the <code>[]u8</code> (slice that contains length) in our function's return signature we use <a href="https://github.com/ziglang/zig/blob/30b8b29f88362d18ea6523a859b29f7bc6dec622/lib/std/mem.zig"><code>std.mem.span</code></a>.</p> <p><a href="https://stackoverflow.com/questions/72736997/how-to-pass-a-c-string-into-a-zig-function-expecting-a-zig-string">This StackOverflow post</a> was useful for understanding this.</p> <h4 id="structs">Structs</h4><p>Anonymous structs in Zig are prefixed with a <code>.</code>. And all struct fields, anonymous or not, are prefixed with <code>.</code>.</p> <p>So <code>.{.x = 1}</code> instantiates an anonymous struct that has one field <code>x</code>.</p> <p>Struct fields in Zig cannot <em>not</em> be instantiated, even if they are nullable. And when you initialize a nullable value you don't need to wrap it in a <code>Some()</code> like you might do in an ML.</p> <p>One thing I found surprising about Zig anonymous structs is that instances of the anonymous <em>type</em> are created per function and two anonymous structs that are structurally identical but referenced in different functions are not actually type-equal.</p> <p>So this doesn't compile:</p> <div class="highlight"><pre><span></span><span class="err">$</span><span class="w"> </span><span class="n">cat</span><span class="w"> </span><span class="k">test</span><span class="p">.</span><span class="n">zig</span> <span class="k">fn</span><span class="w"> </span><span class="n">doA</span><span class="p">()</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">y</span><span class="o">:</span><span class="w"> </span><span class="kt">u8</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{.</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span><span class="p">};</span> <span class="p">}</span> <span class="k">fn</span><span class="w"> </span><span class="n">doB</span><span class="p">()</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">y</span><span class="o">:</span><span class="w"> </span><span class="kt">u8</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">doA</span><span class="p">();</span> <span class="p">}</span> <span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">main</span><span class="p">()</span><span class="w"> </span><span class="o">!</span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">doB</span><span class="p">();</span> <span class="p">}</span> <span class="err">$</span><span class="w"> </span><span class="n">zig</span><span class="w"> </span><span class="n">build</span><span class="o">-</span><span class="n">exe</span><span class="w"> </span><span class="k">test</span><span class="p">.</span><span class="n">zig</span> <span class="k">test</span><span class="p">.</span><span class="n">zig</span><span class="o">:</span><span class="mi">5</span><span class="o">:</span><span class="mi">15</span><span class="o">:</span><span class="w"> </span><span class="k">error</span><span class="o">:</span><span class="w"> </span><span class="n">expected</span><span class="w"> </span><span class="kt">type</span><span class="w"> </span><span class="err">&#39;</span><span class="k">test</span><span class="p">.</span><span class="n">doB__struct_2890</span><span class="err">&#39;</span><span class="p">,</span><span class="w"> </span><span class="n">found</span><span class="w"> </span><span class="err">&#39;</span><span class="k">test</span><span class="p">.</span><span class="n">doA__struct_3878</span><span class="err">&#39;</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">doA</span><span class="p">();</span> <span class="w"> </span><span class="o">~~~^~</span> <span class="k">test</span><span class="p">.</span><span class="n">zig</span><span class="o">:</span><span class="mi">1</span><span class="o">:</span><span class="mi">10</span><span class="o">:</span><span class="w"> </span><span class="n">note</span><span class="o">:</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="n">declared</span><span class="w"> </span><span class="n">here</span> <span class="k">fn</span><span class="w"> </span><span class="n">doA</span><span class="p">()</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">y</span><span class="o">:</span><span class="w"> </span><span class="kt">u8</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="o">^~~~~~~~~~~~~~~~</span> <span class="k">test</span><span class="p">.</span><span class="n">zig</span><span class="o">:</span><span class="mi">4</span><span class="o">:</span><span class="mi">10</span><span class="o">:</span><span class="w"> </span><span class="n">note</span><span class="o">:</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="n">declared</span><span class="w"> </span><span class="n">here</span> <span class="k">fn</span><span class="w"> </span><span class="n">doB</span><span class="p">()</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">y</span><span class="o">:</span><span class="w"> </span><span class="kt">u8</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="o">^~~~~~~~~~~~~~~~</span> <span class="k">test</span><span class="p">.</span><span class="n">zig</span><span class="o">:</span><span class="mi">4</span><span class="o">:</span><span class="mi">10</span><span class="o">:</span><span class="w"> </span><span class="n">note</span><span class="o">:</span><span class="w"> </span><span class="n">function</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kt">type</span><span class="w"> </span><span class="n">declared</span><span class="w"> </span><span class="n">here</span> <span class="k">fn</span><span class="w"> </span><span class="n">doB</span><span class="p">()</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">y</span><span class="o">:</span><span class="w"> </span><span class="kt">u8</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="o">^~~~~~~~~~~~~~~~</span> <span class="n">referenced</span><span class="w"> </span><span class="n">by</span><span class="o">:</span> <span class="w"> </span><span class="n">main</span><span class="o">:</span><span class="w"> </span><span class="k">test</span><span class="p">.</span><span class="n">zig</span><span class="o">:</span><span class="mi">8</span><span class="o">:</span><span class="mi">9</span> <span class="w"> </span><span class="n">callMain</span><span class="o">:</span><span class="w"> </span><span class="o">/</span><span class="n">whatever</span><span class="o">/</span><span class="n">lib</span><span class="o">/</span><span class="n">std</span><span class="o">/</span><span class="n">start</span><span class="p">.</span><span class="n">zig</span><span class="o">:</span><span class="mi">606</span><span class="o">:</span><span class="mi">32</span> <span class="w"> </span><span class="n">remaining</span><span class="w"> </span><span class="n">reference</span><span class="w"> </span><span class="n">traces</span><span class="w"> </span><span class="n">hidden</span><span class="p">;</span><span class="w"> </span><span class="n">use</span><span class="w"> </span><span class="err">&#39;</span><span class="o">-</span><span class="n">freference</span><span class="o">-</span><span class="n">trace</span><span class="err">&#39;</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">see</span><span class="w"> </span><span class="n">all</span><span class="w"> </span><span class="n">reference</span><span class="w"> </span><span class="n">traces</span> </pre></div> <p>You would need to instantiate a new anonymous struct in the second function.</p> <div class="highlight"><pre><span></span><span class="err">$</span><span class="w"> </span><span class="n">cat</span><span class="w"> </span><span class="k">test</span><span class="p">.</span><span class="n">zig</span> <span class="k">fn</span><span class="w"> </span><span class="n">doA</span><span class="p">()</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">y</span><span class="o">:</span><span class="w"> </span><span class="kt">u8</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{.</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span><span class="p">};</span> <span class="p">}</span> <span class="k">fn</span><span class="w"> </span><span class="n">doB</span><span class="p">()</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">y</span><span class="o">:</span><span class="w"> </span><span class="kt">u8</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">doA</span><span class="p">().</span><span class="n">y</span><span class="w"> </span><span class="p">};</span> <span class="p">}</span> <span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">main</span><span class="p">()</span><span class="w"> </span><span class="o">!</span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">doB</span><span class="p">();</span> <span class="p">}</span> </pre></div> <h4 id="uniform-function-call-syntax">Uniform function call syntax</h4><p>Zig seems to support something like <a href="https://en.wikipedia.org/wiki/Uniform_Function_Call_Syntax">uniform function call syntax</a> where you can either call a function with arguments or you can omit the first argument by prefixing the function call with <code>firstargument.</code>. I.e. <code>x.add(y)</code> and <code>add(x, y)</code>.</p> <p>In the case of this code it would be <code>RocksDB.close(db)</code> vs <code>db.close()</code> assuming <code>db</code> is an instance of the <code>RocksDB</code> struct.</p> <p>Like Python, the use of <code>self</code> as the name of this first parameter of a struct's methods is purely convention. You can call it whatever.</p> <p>The point is that we always expect the user to <code>var db = RocksDB.open()</code> for <code>open()</code> and allow the user to do <code>db.close()</code> for <code>close()</code>.</p> <p>Let's move on!</p> <h3 id="setting-a-key-value-pair">Setting a key-value pair</h3><p>We set a pair by calling <code>rocksdb_put</code> with the database instance, some options (we'll leave to defaults), and the key and value strings as C strings.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">set</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">RocksDB</span><span class="p">,</span><span class="w"> </span><span class="n">key</span><span class="o">:</span><span class="w"> </span><span class="p">[</span><span class="o">:</span><span class="mi">0</span><span class="p">]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">value</span><span class="o">:</span><span class="w"> </span><span class="p">[</span><span class="o">:</span><span class="mi">0</span><span class="p">]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">)</span><span class="w"> </span><span class="o">?</span><span class="p">[]</span><span class="kt">u8</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">writeOptions</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rdb</span><span class="p">.</span><span class="n">rocksdb_writeoptions_create</span><span class="p">();</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">err</span><span class="o">:</span><span class="w"> </span><span class="o">?</span><span class="p">[</span><span class="o">*:</span><span class="mi">0</span><span class="p">]</span><span class="kt">u8</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">null</span><span class="p">;</span> <span class="w"> </span><span class="n">rdb</span><span class="p">.</span><span class="n">rocksdb_put</span><span class="p">(</span> <span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">db</span><span class="p">,</span> <span class="w"> </span><span class="n">writeOptions</span><span class="p">,</span> <span class="w"> </span><span class="n">key</span><span class="p">.</span><span class="n">ptr</span><span class="p">,</span> <span class="w"> </span><span class="n">key</span><span class="p">.</span><span class="n">len</span><span class="p">,</span> <span class="w"> </span><span class="n">value</span><span class="p">.</span><span class="n">ptr</span><span class="p">,</span> <span class="w"> </span><span class="n">value</span><span class="p">.</span><span class="n">len</span><span class="p">,</span> <span class="w"> </span><span class="o">&amp;</span><span class="n">err</span><span class="p">,</span> <span class="w"> </span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">err</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">errStr</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">span</span><span class="p">(</span><span class="n">errStr</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">null</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>The only special Zig thing is there is <code>key.ptr</code> to satisfy the Zig / C type system. The type signature <code>key: [:0]const u8</code> and <code>value: [:0]const u8</code> makes sure that the user passes in a null-delimited byte slice, which is what the RocksDB API expects.</p> <h3 id="getting-a-value-from-a-key">Getting a value from a key</h3><p>We set a pair by calling <code>rocksdb_get</code> with the database instance, some options (we'll again leave to defaults), and the key as a C string.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">get</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">RocksDB</span><span class="p">,</span><span class="w"> </span><span class="n">key</span><span class="o">:</span><span class="w"> </span><span class="p">[</span><span class="o">:</span><span class="mi">0</span><span class="p">]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">)</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">val</span><span class="o">:</span><span class="w"> </span><span class="o">?</span><span class="p">[]</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">err</span><span class="o">:</span><span class="w"> </span><span class="o">?</span><span class="p">[]</span><span class="kt">u8</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">readOptions</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rdb</span><span class="p">.</span><span class="n">rocksdb_readoptions_create</span><span class="p">();</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">valueLength</span><span class="o">:</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">err</span><span class="o">:</span><span class="w"> </span><span class="o">?</span><span class="p">[</span><span class="o">*:</span><span class="mi">0</span><span class="p">]</span><span class="kt">u8</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">null</span><span class="p">;</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">v</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rdb</span><span class="p">.</span><span class="n">rocksdb_get</span><span class="p">(</span> <span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">db</span><span class="p">,</span> <span class="w"> </span><span class="n">readOptions</span><span class="p">,</span> <span class="w"> </span><span class="n">key</span><span class="p">.</span><span class="n">ptr</span><span class="p">,</span> <span class="w"> </span><span class="n">key</span><span class="p">.</span><span class="n">len</span><span class="p">,</span> <span class="w"> </span><span class="o">&amp;</span><span class="n">valueLength</span><span class="p">,</span> <span class="w"> </span><span class="o">&amp;</span><span class="n">err</span><span class="p">,</span> <span class="w"> </span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">err</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">errStr</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">null</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">span</span><span class="p">(</span><span class="n">errStr</span><span class="p">)</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">v</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">null</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">null</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">v</span><span class="p">[</span><span class="mi">0</span><span class="p">..</span><span class="n">valueLength</span><span class="p">],</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">null</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>One thing in there to call out is that we can go from a null-delimited value <code>v</code> to a standard Zig slice <code>[]u8</code> by slicing from <code>0</code> to the length of the value returned by the C API.</p> <p>Also, <code>rocksdb_get</code> is only used for getting a single key-value pair. We'll handle key-value pair iteration next.</p> <h3 id="iterating-over-key-value-pairs">Iterating over key-value pairs</h3><p>The basic structure of RocksDB's iterator API is that you first create an iterator instance with <code>rocksdb_create_iterator()</code>. Then you either <code>rocksdb_iter_seek_to_first()</code> or <code>rocksdb_iter_seek()</code> (with a prefix) to get the iterator ready. Then you get the current iterator entry's key with <code>rocksdb_iter_key()</code> and value with <code>rocksdb_iter_value()</code>. You move on to the next entry in the iterator with <code>rocksdb_iter_next()</code> and check that the current iterator value is valid with <code>rocksdb_iter_valid()</code>. When the iterator is no longer valid, or if you want to stop iterating early, you call <code>rocksdb_iter_destroy()</code>.</p> <p>But we'd like to present a Zig-only interface to users of the <code>RocksDB</code> Zig struct. So we'll create a <code>RocksDB.iter()</code> function that returns a <code>RocksDB.Iter</code> with an <code>RocksDB.Iter.next()</code> function that will return an optional <code>RocksDB.IterEntry</code>.</p> <p>We'll start backwards with that <code>RocksDB.Iter</code> struct.</p> <h4 id="<code>rocksdb.iter</code>"><code>RocksDB.Iter</code></h4><p>Each iterator instance will store a pointer to a RocksDB iterator instance. It will store the prefix requested (which is allowed to be an empty string). If the prefix is set though, we'll only iterate while the iterator key has the requested prefix.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">IterEntry</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">key</span><span class="p">:</span><span class="w"> </span><span class="p">[]</span><span class="k">const</span><span class="w"> </span><span class="n">u8</span><span class="p">,</span> <span class="w"> </span><span class="n">value</span><span class="p">:</span><span class="w"> </span><span class="p">[]</span><span class="k">const</span><span class="w"> </span><span class="n">u8</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">Iter</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">iter</span><span class="p">:</span><span class="w"> </span><span class="o">*</span><span class="n">rdb</span><span class="o">.</span><span class="n">rocksdb_iterator_t</span><span class="p">,</span> <span class="w"> </span><span class="n">first</span><span class="p">:</span><span class="w"> </span><span class="nb nb-Type">bool</span><span class="p">,</span> <span class="w"> </span><span class="n">prefix</span><span class="p">:</span><span class="w"> </span><span class="p">[]</span><span class="k">const</span><span class="w"> </span><span class="n">u8</span><span class="p">,</span> <span class="w"> </span><span class="n">fn</span><span class="w"> </span><span class="n">next</span><span class="p">(</span><span class="bp">self</span><span class="p">:</span><span class="w"> </span><span class="o">*</span><span class="n">Iter</span><span class="p">)</span><span class="w"> </span><span class="err">?</span><span class="n">IterEntry</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="bp">self</span><span class="o">.</span><span class="n">first</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">rdb</span><span class="o">.</span><span class="n">rocksdb_iter_next</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">iter</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="bp">self</span><span class="o">.</span><span class="n">first</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="bp">false</span><span class="p">;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">rdb</span><span class="o">.</span><span class="n">rocksdb_iter_valid</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">iter</span><span class="p">)</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb nb-Type">null</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">var</span><span class="w"> </span><span class="n">keySize</span><span class="p">:</span><span class="w"> </span><span class="n">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="k">var</span><span class="w"> </span><span class="n">key</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rdb</span><span class="o">.</span><span class="n">rocksdb_iter_key</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">iter</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="n">keySize</span><span class="p">);</span> <span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Make</span><span class="w"> </span><span class="n">sure</span><span class="w"> </span><span class="n">key</span><span class="w"> </span><span class="k">is</span><span class="w"> </span><span class="n">still</span><span class="w"> </span><span class="n">within</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">prefix</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">prefix</span><span class="o">.</span><span class="n">len</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">prefix</span><span class="o">.</span><span class="n">len</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="n">keySize</span><span class="w"> </span><span class="ow">or</span> <span class="w"> </span><span class="o">!</span><span class="n">std</span><span class="o">.</span><span class="n">mem</span><span class="o">.</span><span class="n">eql</span><span class="p">(</span><span class="n">u8</span><span class="p">,</span><span class="w"> </span><span class="n">key</span><span class="p">[</span><span class="mf">0.</span><span class="o">.</span><span class="n">self</span><span class="o">.</span><span class="n">prefix</span><span class="o">.</span><span class="n">len</span><span class="p">],</span><span class="w"> </span><span class="bp">self</span><span class="o">.</span><span class="n">prefix</span><span class="p">))</span> <span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb nb-Type">null</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">var</span><span class="w"> </span><span class="n">valueSize</span><span class="p">:</span><span class="w"> </span><span class="n">usize</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="k">var</span><span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rdb</span><span class="o">.</span><span class="n">rocksdb_iter_value</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">iter</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="n">valueSize</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">IterEntry</span><span class="p">{</span> <span class="w"> </span><span class="o">.</span><span class="n">key</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">key</span><span class="p">[</span><span class="mf">0.</span><span class="o">.</span><span class="n">keySize</span><span class="p">],</span> <span class="w"> </span><span class="o">.</span><span class="n">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">value</span><span class="p">[</span><span class="mf">0.</span><span class="o">.</span><span class="n">valueSize</span><span class="p">],</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Finally we'll wrap the <code>rocksdb_iter_destroy()</code> method:</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">close</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">Iter</span><span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">rdb</span><span class="p">.</span><span class="n">rocksdb_iter_destroy</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">iter</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">};</span> </pre></div> <h4 id="<code>rocksdb.iter()</code>"><code>RocksDB.iter()</code></h4><p>Now we can write the function that creates the <code>RocksDB.Iter</code>. As previously mentioned we must first instantiate the RocksDB iterator and then <code>seek</code> to either the first entry if the user doesn't request a prefix. Or if the user requests a prefix, we <code>seek</code> until that prefix.</p> <div class="highlight"><pre><span></span><span class="k">fn</span><span class="w"> </span><span class="n">iter</span><span class="p">(</span><span class="n">self</span><span class="o">:</span><span class="w"> </span><span class="n">RocksDB</span><span class="p">,</span><span class="w"> </span><span class="n">prefix</span><span class="o">:</span><span class="w"> </span><span class="p">[</span><span class="o">:</span><span class="mi">0</span><span class="p">]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="p">)</span><span class="w"> </span><span class="k">struct</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">val</span><span class="o">:</span><span class="w"> </span><span class="o">?</span><span class="n">Iter</span><span class="p">,</span><span class="w"> </span><span class="n">err</span><span class="o">:</span><span class="w"> </span><span class="o">?</span><span class="p">[]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">readOptions</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rdb</span><span class="p">.</span><span class="n">rocksdb_readoptions_create</span><span class="p">();</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">it</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Iter</span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="n">iter</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">undefined</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">first</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span> <span class="w"> </span><span class="p">.</span><span class="n">prefix</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">prefix</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">rdb</span><span class="p">.</span><span class="n">rocksdb_create_iterator</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">db</span><span class="p">,</span><span class="w"> </span><span class="n">readOptions</span><span class="p">))</span><span class="w"> </span><span class="o">|</span><span class="n">i</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">it</span><span class="p">.</span><span class="n">iter</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">null</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;Could not create iterator&quot;</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">prefix</span><span class="p">.</span><span class="n">len</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">rdb</span><span class="p">.</span><span class="n">rocksdb_iter_seek</span><span class="p">(</span> <span class="w"> </span><span class="n">it</span><span class="p">.</span><span class="n">iter</span><span class="p">,</span> <span class="w"> </span><span class="n">prefix</span><span class="p">.</span><span class="n">ptr</span><span class="p">,</span> <span class="w"> </span><span class="n">prefix</span><span class="p">.</span><span class="n">len</span><span class="p">,</span> <span class="w"> </span><span class="p">);</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">rdb</span><span class="p">.</span><span class="n">rocksdb_iter_seek_to_first</span><span class="p">(</span><span class="n">it</span><span class="p">.</span><span class="n">iter</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="p">.</span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">it</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">null</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="p">};</span> </pre></div> <p>And now we're done a basic Zig wrapper for the RocksDB API!</p> <h3 id="<code>main</code>"><code>main</code></h3><p>Next we write a simple command-line entrypoint that uses the RocksDB wrapper we built. This is not the prettiest code but it gets the job done.</p> <div class="highlight"><pre><span></span><span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">main</span><span class="p">()</span><span class="w"> </span><span class="o">!</span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">openRes</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">RocksDB</span><span class="p">.</span><span class="n">open</span><span class="p">(</span><span class="s">&quot;/tmp/db&quot;</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">openRes</span><span class="p">.</span><span class="n">err</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;Failed to open: {s}.</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">err</span><span class="p">});</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">db</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">openRes</span><span class="p">.</span><span class="n">val</span><span class="p">.</span><span class="o">?</span><span class="p">;</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">db</span><span class="p">.</span><span class="n">close</span><span class="p">();</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">args</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">process</span><span class="p">.</span><span class="n">args</span><span class="p">();</span> <span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">args</span><span class="p">.</span><span class="n">next</span><span class="p">();</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">key</span><span class="o">:</span><span class="w"> </span><span class="p">[</span><span class="o">:</span><span class="mi">0</span><span class="p">]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">;</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">value</span><span class="o">:</span><span class="w"> </span><span class="p">[</span><span class="o">:</span><span class="mi">0</span><span class="p">]</span><span class="kr">const</span><span class="w"> </span><span class="kt">u8</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">;</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">command</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;get&quot;</span><span class="p">;</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">args</span><span class="p">.</span><span class="n">next</span><span class="p">())</span><span class="w"> </span><span class="o">|</span><span class="n">arg</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">arg</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;set&quot;</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">command</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;set&quot;</span><span class="p">;</span> <span class="w"> </span><span class="n">key</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">args</span><span class="p">.</span><span class="n">next</span><span class="p">().</span><span class="o">?</span><span class="p">;</span> <span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">args</span><span class="p">.</span><span class="n">next</span><span class="p">().</span><span class="o">?</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">arg</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;get&quot;</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">command</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;get&quot;</span><span class="p">;</span> <span class="w"> </span><span class="n">key</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">args</span><span class="p">.</span><span class="n">next</span><span class="p">().</span><span class="o">?</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">arg</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;list&quot;</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">command</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;lst&quot;</span><span class="p">;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">args</span><span class="p">.</span><span class="n">next</span><span class="p">())</span><span class="w"> </span><span class="o">|</span><span class="n">argNext</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">key</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">argNext</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;Must specify command (get, set, or list). Got: &#39;{s}&#39;.</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">arg</span><span class="p">});</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">command</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;set&quot;</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">setErr</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">db</span><span class="p">.</span><span class="n">set</span><span class="p">(</span><span class="n">key</span><span class="p">,</span><span class="w"> </span><span class="n">value</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">setErr</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;Error setting key: {s}.</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">err</span><span class="p">});</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="p">.</span><span class="n">mem</span><span class="p">.</span><span class="n">eql</span><span class="p">(</span><span class="kt">u8</span><span class="p">,</span><span class="w"> </span><span class="n">command</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;get&quot;</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">getRes</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">db</span><span class="p">.</span><span class="n">get</span><span class="p">(</span><span class="n">key</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">getRes</span><span class="p">.</span><span class="n">err</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;Error getting key: {s}.</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">err</span><span class="p">});</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">getRes</span><span class="p">.</span><span class="n">val</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">v</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;{s}</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">v</span><span class="p">});</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;Key not found.</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{});</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">prefix</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">key</span><span class="p">;</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">iterRes</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">db</span><span class="p">.</span><span class="n">iter</span><span class="p">(</span><span class="n">prefix</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">iterRes</span><span class="p">.</span><span class="n">err</span><span class="p">)</span><span class="w"> </span><span class="o">|</span><span class="n">err</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;Error getting iterator: {s}.</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="n">err</span><span class="p">});</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kr">var</span><span class="w"> </span><span class="n">iter</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">iterRes</span><span class="p">.</span><span class="n">val</span><span class="p">.</span><span class="o">?</span><span class="p">;</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="n">iter</span><span class="p">.</span><span class="n">close</span><span class="p">();</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">iter</span><span class="p">.</span><span class="n">next</span><span class="p">())</span><span class="w"> </span><span class="o">|</span><span class="n">entry</span><span class="o">|</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">debug</span><span class="p">.</span><span class="n">print</span><span class="p">(</span><span class="s">&quot;{s} = {s}</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.{</span><span class="w"> </span><span class="n">entry</span><span class="p">.</span><span class="n">key</span><span class="p">,</span><span class="w"> </span><span class="n">entry</span><span class="p">.</span><span class="n">value</span><span class="w"> </span><span class="p">});</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>Notably, the <code>main</code> function must be marked <code>pub</code>. The struct and struct methods we wrote would need to be marked <code>pub</code> if we wanted them accessible from other files. But since this is a single file, <code>pub</code> doesn't matter. Except for <code>main</code>.</p> <p>Now we can get into building.</p> <h3 id="building">Building</h3><p>First we need to compile the RocksDB library. To do this we simply <code>git clone</code> RocksDB and run <code>make shared_libs</code>.</p> <h4 id="compiling-rocksdb">Compiling RocksDB</h4><div class="highlight"><pre><span></span>$<span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https://github.com/facebook/rocksdb $<span class="w"> </span><span class="o">(</span><span class="w"> </span><span class="nb">cd</span><span class="w"> </span>rocksdb<span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span>make<span class="w"> </span>shared_lib<span class="w"> </span>-j8<span class="w"> </span><span class="o">)</span> </pre></div> <p>This may take a while, sorry.</p> <h4 id="<code>build.zig</code>"><code>build.zig</code></h4><p>Next we need to write a <code>build.zig</code> script that tells Zig about this external library. This was one of the harder parts of the process, but building and linking against foreign libraries is almost always hard.</p> <div class="highlight"><pre><span></span><span class="err">$</span><span class="w"> </span><span class="n">cat</span><span class="w"> </span><span class="n">build</span><span class="p">.</span><span class="n">zig</span> <span class="kr">const</span><span class="w"> </span><span class="n">version</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">&quot;builtin&quot;</span><span class="p">).</span><span class="n">zig_version</span><span class="p">;</span> <span class="kr">const</span><span class="w"> </span><span class="n">std</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">@import</span><span class="p">(</span><span class="s">&quot;std&quot;</span><span class="p">);</span> <span class="kr">pub</span><span class="w"> </span><span class="k">fn</span><span class="w"> </span><span class="n">build</span><span class="p">(</span><span class="n">b</span><span class="o">:</span><span class="w"> </span><span class="o">*</span><span class="n">std</span><span class="p">.</span><span class="n">build</span><span class="p">.</span><span class="n">Builder</span><span class="p">)</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">const</span><span class="w"> </span><span class="n">exe</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">b</span><span class="p">.</span><span class="n">addExecutable</span><span class="p">(</span><span class="s">&quot;main&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;main.zig&quot;</span><span class="p">);</span> <span class="w"> </span><span class="n">exe</span><span class="p">.</span><span class="n">linkLibC</span><span class="p">();</span> <span class="w"> </span><span class="n">exe</span><span class="p">.</span><span class="n">linkSystemLibraryName</span><span class="p">(</span><span class="s">&quot;rocksdb&quot;</span><span class="p">);</span> <span class="w"> </span><span class="n">exe</span><span class="p">.</span><span class="n">addLibraryPath</span><span class="p">(</span><span class="s">&quot;./rocksdb&quot;</span><span class="p">);</span> <span class="w"> </span><span class="n">exe</span><span class="p">.</span><span class="n">addIncludePath</span><span class="p">(</span><span class="s">&quot;./rocksdb/include&quot;</span><span class="p">);</span> <span class="w"> </span><span class="n">exe</span><span class="p">.</span><span class="n">setOutputDir</span><span class="p">(</span><span class="s">&quot;.&quot;</span><span class="p">);</span> <span class="w"> </span><span class="n">exe</span><span class="p">.</span><span class="n">install</span><span class="p">();</span> <span class="p">}</span> </pre></div> <p>Felix Queißner's <a href="https://zig.news/xq/zig-build-explained-part-3-1ima">zig build explained</a> series was quite helpful.</p> <p>Now we just:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>zig<span class="w"> </span>build </pre></div> <p>And run!</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>./main<span class="w"> </span>list $<span class="w"> </span>./main<span class="w"> </span><span class="nb">set</span><span class="w"> </span>x<span class="w"> </span><span class="m">12</span> $<span class="w"> </span>./main<span class="w"> </span><span class="nb">set</span><span class="w"> </span>xy<span class="w"> </span><span class="m">300</span> $<span class="w"> </span>./main<span class="w"> </span>list <span class="nv">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">12</span> <span class="nv">xy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">300</span> $<span class="w"> </span>./main<span class="w"> </span>get<span class="w"> </span>xy <span class="m">300</span> $<span class="w"> </span>./main<span class="w"> </span>list<span class="w"> </span>xy <span class="nv">xy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">300</span> </pre></div> <p>Not bad!</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I wrote a new post on using RocksDB with Zig! There weren&#39;t a lot of good examples of the C API and it was good practice for learning Zig.<br><br>Also sets me up for integrating it in a (WIP) port of my toy SQL database from Go to Zig. (This time with storage!)<a href="https://t.co/zquojV974G">https://t.co/zquojV974G</a> <a href="https://t.co/gtAsB6Wrhi">pic.twitter.com/gtAsB6Wrhi</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1586908890960117760?ref_src=twsrc%5Etfw">October 31, 2022</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/zigrocks.htmlSun, 30 Oct 2022 00:00:00 +0000A database without dynamic memory allocationhttp://notes.eatonphil.com/a-database-without-dynamic-memory.html<head> <meta http-equiv="refresh" content="4;URL='https://tigerbeetle.com/blog/a-database-without-dynamic-memory/'" /> </head><p>This is an external post of mine. Click <a href="https://tigerbeetle.com/blog/a-database-without-dynamic-memory/">here</a> if you are not redirected.</p> http://notes.eatonphil.com/a-database-without-dynamic-memory.htmlWed, 12 Oct 2022 00:00:00 +0000A minimal distributed key-value database with Hashicorp's Raft libraryhttp://notes.eatonphil.com/minimal-key-value-store-with-hashicorp-raft.html<p>When I wrote the "<a href="/distributed-postgres.html">build a distributed PostgreSQL proof of concept</a>" post I first had to figure out how to use <a href="https://github.com/hashicorp/raft">Hashicorp's Raft implementation</a>.</p> <p>There weren't any examples I could find in the Hashicorp repo itself. And the only example I <em>could</em> find was Philip O'Toole's <a href="https://github.com/otoolep/hraftd">hraftd</a>. It's great! However, I have a hard time following multi-file examples in general.</p> <p>So I built my own <a href="https://github.com/eatonphil/raft-example">single-file example</a>. It's not perfect but it helped me get started and may help you too. We'll walk through that code, ~260 lines of Go, in this post.</p> <p>The key-value database will only be able to set keys, not delete them. But it will be able to overwrite existing entries. And it will expose this distributed key-value database over an HTTP API.</p> <p>Here's a sample interaction it will be able to support.</p> <p>Terminal 1:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>./raft-example<span class="w"> </span>--node-id<span class="w"> </span>node1<span class="w"> </span>--raft-port<span class="w"> </span><span class="m">2222</span><span class="w"> </span>--http-port<span class="w"> </span><span class="m">8222</span> </pre></div> <p>Terminal 2:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>./raft-example<span class="w"> </span>--node-id<span class="w"> </span>node2<span class="w"> </span>--raft-port<span class="w"> </span><span class="m">2223</span><span class="w"> </span>--http-port<span class="w"> </span><span class="m">8223</span> </pre></div> <p>Terminal 3, tell 1 to have 2 follow it:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>curl<span class="w"> </span><span class="s1">&#39;localhost:8222/join?followerAddr=localhost:2223&amp;followerId=node2&#39;</span> </pre></div> <p>Terminal 3, now add a key:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>curl<span class="w"> </span>-X<span class="w"> </span>POST<span class="w"> </span><span class="s1">&#39;localhost:8222/set&#39;</span><span class="w"> </span>-d<span class="w"> </span><span class="s1">&#39;{&quot;key&quot;: &quot;x&quot;, &quot;value&quot;: &quot;23&quot;}&#39;</span><span class="w"> </span>-H<span class="w"> </span><span class="s1">&#39;content-type: application/json&#39;</span> </pre></div> <p>Terminal 3, now get the key from either server:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>curl<span class="w"> </span><span class="s1">&#39;localhost:8222/get?key=x&#39;</span> <span class="o">{</span><span class="s2">&quot;data&quot;</span>:<span class="s2">&quot;23&quot;</span><span class="o">}</span> $<span class="w"> </span>curl<span class="w"> </span><span class="s1">&#39;localhost:8223/get?key=x&#39;</span> <span class="o">{</span><span class="s2">&quot;data&quot;</span>:<span class="s2">&quot;23&quot;</span><span class="o">}</span> </pre></div> <p>Let's make it happen!</p> <h3 id="eine-kleine-background">Eine kleine background</h3><p>Raft is an algorithm for managing a replicated (basically append-only) log over a cluster of nodes. When you combine this with a state machine you get a stateful, distributed application. Log entries act as commands for the state machine. When a node in the Raft cluster crashes, it is brought up to date by sending (also called "replaying") all commands in the log through the state machine.</p> <p>This can be made more efficient by implementing an application-specific concept of state snapshots. But since snapshots are just an optimization, we'll skip it entirely to keep this application simple.</p> <p>If you want the details, just <a href="https://raft.github.io/raft.pdf">read the Raft paper</a>! It is surprisingly accessible, especially as a user.</p> <h3 id="our-app">Our app</h3><p>In our distributed key-value application, commands will be a serialized struct with a key and a value. The state machine will take each struct and set the key to the value in memory. Thus after replaying the entire log (and continuing to apply future log entries), each node will have an in-memory key-value store that is up to date with all other nodes in the cluster.</p> <p>Note that although each node's key-value store will only be in memory, it will be backed by the durable append-only log! So with, Raft each in-memory key-value store will still be durable.</p> <p>Let's get things set up in a file, <code>main.go</code>.</p> <div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span> <span class="kn">import</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="s">&quot;encoding/json&quot;</span> <span class="w"> </span><span class="s">&quot;fmt&quot;</span> <span class="w"> </span><span class="s">&quot;io&quot;</span> <span class="w"> </span><span class="s">&quot;log&quot;</span> <span class="w"> </span><span class="s">&quot;net&quot;</span> <span class="w"> </span><span class="s">&quot;net/http&quot;</span> <span class="w"> </span><span class="s">&quot;os&quot;</span> <span class="w"> </span><span class="s">&quot;path&quot;</span> <span class="w"> </span><span class="s">&quot;sync&quot;</span> <span class="w"> </span><span class="s">&quot;time&quot;</span> <span class="w"> </span><span class="s">&quot;github.com/hashicorp/raft&quot;</span> <span class="w"> </span><span class="s">&quot;github.com/hashicorp/raft-boltdb&quot;</span> <span class="p">)</span> </pre></div> <h3 id="the-state-machine">The state machine</h3><p>The state machine acts on an in-memory key-value store.</p> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">kvFsm</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">db</span><span class="w"> </span><span class="o">*</span><span class="nx">sync</span><span class="p">.</span><span class="nx">Map</span> <span class="p">}</span> </pre></div> <p>There are three operations this Raft library wants us to implement on our state machine struct.</p> <h4 id="apply">Apply</h4><p>The Apply operation is sent to basically-up-to-date nodes to keep them up to date. An Apply call is made for each new log the leader commits.</p> <p>Each log message will contain a key and value to store in the in-memory key-value store.</p> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">setPayload</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">Key</span><span class="w"> </span><span class="kt">string</span> <span class="w"> </span><span class="nx">Value</span><span class="w"> </span><span class="kt">string</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">kf</span><span class="w"> </span><span class="o">*</span><span class="nx">kvFsm</span><span class="p">)</span><span class="w"> </span><span class="nx">Apply</span><span class="p">(</span><span class="nx">log</span><span class="w"> </span><span class="o">*</span><span class="nx">raft</span><span class="p">.</span><span class="nx">Log</span><span class="p">)</span><span class="w"> </span><span class="kt">any</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Type</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">raft</span><span class="p">.</span><span class="nx">LogCommand</span><span class="p">:</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">sp</span><span class="w"> </span><span class="nx">setPayload</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">Unmarshal</span><span class="p">(</span><span class="nx">log</span><span class="p">.</span><span class="nx">Data</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">sp</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Could not parse payload: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">kf</span><span class="p">.</span><span class="nx">db</span><span class="p">.</span><span class="nx">Store</span><span class="p">(</span><span class="nx">sp</span><span class="p">.</span><span class="nx">Key</span><span class="p">,</span><span class="w"> </span><span class="nx">sp</span><span class="p">.</span><span class="nx">Value</span><span class="p">)</span> <span class="w"> </span><span class="k">default</span><span class="p">:</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Unknown raft log type: %#v&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Type</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <p>Here we're reading a log in a custom format. Later on down in the HTTP server we'll write the part that submits that log in this custom format.</p> <p>The Raft library just cares that logs are (opaque) bytes. Whatever format works.</p> <h4 id="restore">Restore</h4><p>The Restore operation reads all logs and applies them to the state machine.</p> <p>It looks very similar to the <code>Apply</code> function we just wrote except for that this operates on an <code>io.ReadCloser</code> of serialized log data rather than the high-level <code>raft.Log</code> struct.</p> <p>And most importantly, and unlike the <code>Apply</code> function, <code>Restore</code> must reset all local state.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">kf</span><span class="w"> </span><span class="o">*</span><span class="nx">kvFsm</span><span class="p">)</span><span class="w"> </span><span class="nx">Restore</span><span class="p">(</span><span class="nx">rc</span><span class="w"> </span><span class="nx">io</span><span class="p">.</span><span class="nx">ReadCloser</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Must always restore from a clean state!!</span> <span class="w"> </span><span class="nx">kf</span><span class="p">.</span><span class="nx">db</span><span class="p">.</span><span class="nx">Range</span><span class="p">(</span><span class="kd">func</span><span class="p">(</span><span class="nx">key</span><span class="w"> </span><span class="kt">any</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="w"> </span><span class="kt">any</span><span class="p">)</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">kf</span><span class="p">.</span><span class="nx">db</span><span class="p">.</span><span class="nx">Delete</span><span class="p">(</span><span class="nx">key</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">true</span> <span class="w"> </span><span class="p">})</span> <span class="w"> </span><span class="nx">decoder</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">NewDecoder</span><span class="p">(</span><span class="nx">rc</span><span class="p">)</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">decoder</span><span class="p">.</span><span class="nx">More</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">sp</span><span class="w"> </span><span class="nx">setPayload</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">decoder</span><span class="p">.</span><span class="nx">Decode</span><span class="p">(</span><span class="o">&amp;</span><span class="nx">sp</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Could not decode payload: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">kf</span><span class="p">.</span><span class="nx">db</span><span class="p">.</span><span class="nx">Store</span><span class="p">(</span><span class="nx">sp</span><span class="p">.</span><span class="nx">Key</span><span class="p">,</span><span class="w"> </span><span class="nx">sp</span><span class="p">.</span><span class="nx">Value</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">rc</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span> <span class="p">}</span> </pre></div> <p>The <code>io.ReadCloser</code> represents the latest snapshot or the beginning of time if there are no snapshots.</p> <h4 id="snapshot">Snapshot</h4><p>We won't implement this. But to satisfy the Go interface we must have empty some functions.</p> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">snapshotNoop</span><span class="w"> </span><span class="kd">struct</span><span class="p">{}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">sn</span><span class="w"> </span><span class="nx">snapshotNoop</span><span class="p">)</span><span class="w"> </span><span class="nx">Persist</span><span class="p">(</span><span class="nx">_</span><span class="w"> </span><span class="nx">raft</span><span class="p">.</span><span class="nx">SnapshotSink</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">sn</span><span class="w"> </span><span class="nx">snapshotNoop</span><span class="p">)</span><span class="w"> </span><span class="nx">Release</span><span class="p">()</span><span class="w"> </span><span class="p">{}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">kf</span><span class="w"> </span><span class="o">*</span><span class="nx">kvFsm</span><span class="p">)</span><span class="w"> </span><span class="nx">Snapshot</span><span class="p">()</span><span class="w"> </span><span class="p">(</span><span class="nx">raft</span><span class="p">.</span><span class="nx">FSMSnapshot</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">snapshotNoop</span><span class="p">{},</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <p class="note"> I <em>think</em> this is a correct noop. If we implemented a real snapshot we'd serialize the current key-value state, and <code>raft.SnapshotSink.Write()</code> it to the <code>raft.SnapshotSink</code>. That sink, in turn, is what is passed (as an <code>io.ReadCloser</code>) to the <code>Restore</code> method above. <br /> <br /> So it must be that when we do not call <code>raft.SnapshotSink.Close()</code>, <a href="https://pkg.go.dev/github.com/hashicorp/raft#FSMSnapshot">as the docs suggest</a>, no snapshot gets recorded. <br /> <br /> Since we aren't implementing snapshots, the Raft library must be doing its own serialization, writing each message's bytes directly to some sink. <br /> <br /> If I'm wrong, <a href="mailto:[email protected]">feel free to correct me</a>. </p><p>That's it for the state machine!</p> <h3 id="raft-node-initialization">Raft node initialization</h3><p>In order to start the Raft library behavior for each node, we need a whole bunch of boilerplate for Raft library initialization.</p> <p>Each Raft node needs a TCP port that it uses to communicate with other nodes in the same cluster.</p> <p>Each node starts out in a single-node cluster where it is the leader. Only when told to (and given the address of other nodes) does there become a multi-node cluster.</p> <p>Each node also needs a permanent store for the append-only log. The Hashicorp Raft library suggests <a href="https://github.com/hashicorp/raft-boltdb">boltdb</a>.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">setupRaft</span><span class="p">(</span><span class="nx">dir</span><span class="p">,</span><span class="w"> </span><span class="nx">nodeId</span><span class="p">,</span><span class="w"> </span><span class="nx">raftAddress</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">kf</span><span class="w"> </span><span class="o">*</span><span class="nx">kvFsm</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">raft</span><span class="p">.</span><span class="nx">Raft</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">MkdirAll</span><span class="p">(</span><span class="nx">dir</span><span class="p">,</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">ModePerm</span><span class="p">)</span> <span class="w"> </span><span class="nx">store</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">raftboltdb</span><span class="p">.</span><span class="nx">NewBoltStore</span><span class="p">(</span><span class="nx">path</span><span class="p">.</span><span class="nx">Join</span><span class="p">(</span><span class="nx">dir</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;bolt&quot;</span><span class="p">))</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Could not create bolt store: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">snapshots</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">raft</span><span class="p">.</span><span class="nx">NewFileSnapshotStore</span><span class="p">(</span><span class="nx">path</span><span class="p">.</span><span class="nx">Join</span><span class="p">(</span><span class="nx">dir</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;snapshot&quot;</span><span class="p">),</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Stderr</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Could not create snapshot store: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">tcpAddr</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">net</span><span class="p">.</span><span class="nx">ResolveTCPAddr</span><span class="p">(</span><span class="s">&quot;tcp&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">raftAddress</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Could not resolve address: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">transport</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">raft</span><span class="p">.</span><span class="nx">NewTCPTransport</span><span class="p">(</span><span class="nx">raftAddress</span><span class="p">,</span><span class="w"> </span><span class="nx">tcpAddr</span><span class="p">,</span><span class="w"> </span><span class="mi">10</span><span class="p">,</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Second</span><span class="o">*</span><span class="mi">10</span><span class="p">,</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Stderr</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Could not create tcp transport: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">raftCfg</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">raft</span><span class="p">.</span><span class="nx">DefaultConfig</span><span class="p">()</span> <span class="w"> </span><span class="nx">raftCfg</span><span class="p">.</span><span class="nx">LocalID</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">raft</span><span class="p">.</span><span class="nx">ServerID</span><span class="p">(</span><span class="nx">nodeId</span><span class="p">)</span> <span class="w"> </span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">raft</span><span class="p">.</span><span class="nx">NewRaft</span><span class="p">(</span><span class="nx">raftCfg</span><span class="p">,</span><span class="w"> </span><span class="nx">kf</span><span class="p">,</span><span class="w"> </span><span class="nx">store</span><span class="p">,</span><span class="w"> </span><span class="nx">store</span><span class="p">,</span><span class="w"> </span><span class="nx">snapshots</span><span class="p">,</span><span class="w"> </span><span class="nx">transport</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Could not create raft instance: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Cluster consists of unjoined leaders. Picking a leader and</span> <span class="w"> </span><span class="c1">// creating a real cluster is done manually after startup.</span> <span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">BootstrapCluster</span><span class="p">(</span><span class="nx">raft</span><span class="p">.</span><span class="nx">Configuration</span><span class="p">{</span> <span class="w"> </span><span class="nx">Servers</span><span class="p">:</span><span class="w"> </span><span class="p">[]</span><span class="nx">raft</span><span class="p">.</span><span class="nx">Server</span><span class="p">{</span> <span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">ID</span><span class="p">:</span><span class="w"> </span><span class="nx">raft</span><span class="p">.</span><span class="nx">ServerID</span><span class="p">(</span><span class="nx">nodeId</span><span class="p">),</span> <span class="w"> </span><span class="nx">Address</span><span class="p">:</span><span class="w"> </span><span class="nx">transport</span><span class="p">.</span><span class="nx">LocalAddr</span><span class="p">(),</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">})</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <p>Now let's dig into how nodes learn about each other.</p> <h3 id="an-http-api">An HTTP API</h3><p>This key-value store application will have an HTTP API serving two purposes:</p> <ul> <li>Cluster management: telling a leader to add followers</li> <li>Key-value storage: setting and getting keys</li> </ul> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">httpServer</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">raft</span><span class="p">.</span><span class="nx">Raft</span> <span class="w"> </span><span class="nx">db</span><span class="w"> </span><span class="o">*</span><span class="nx">sync</span><span class="p">.</span><span class="nx">Map</span> <span class="p">}</span> </pre></div> <h4 id="cluster-management">Cluster management</h4><p>In this library, the leader is told to add other nodes as its follower. (This feels backwards to me, but it is what it is!)</p> <p>For this, the library requires a node ID and its internal TCP port for Raft messages.</p> <p>These will both be parameters we give each node later on when the node process is started.</p> <div class="highlight"><pre><span></span><span class="k">func</span><span class="w"> </span><span class="p">(</span><span class="n">hs</span><span class="w"> </span><span class="n">httpServer</span><span class="p">)</span><span class="w"> </span><span class="n">joinHandler</span><span class="p">(</span><span class="n">w</span><span class="w"> </span><span class="n">http</span><span class="o">.</span><span class="n">ResponseWriter</span><span class="p">,</span><span class="w"> </span><span class="n">r</span><span class="w"> </span><span class="o">*</span><span class="n">http</span><span class="o">.</span><span class="n">Request</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">followerId</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">r</span><span class="o">.</span><span class="n">URL</span><span class="o">.</span><span class="n">Query</span><span class="p">()</span><span class="o">.</span><span class="n">Get</span><span class="p">(</span><span class="s2">&quot;followerId&quot;</span><span class="p">)</span> <span class="w"> </span><span class="n">followerAddr</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">r</span><span class="o">.</span><span class="n">URL</span><span class="o">.</span><span class="n">Query</span><span class="p">()</span><span class="o">.</span><span class="n">Get</span><span class="p">(</span><span class="s2">&quot;followerAddr&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">hs</span><span class="o">.</span><span class="n">r</span><span class="o">.</span><span class="n">State</span><span class="p">()</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">raft</span><span class="o">.</span><span class="n">Leader</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">json</span><span class="o">.</span><span class="n">NewEncoder</span><span class="p">(</span><span class="n">w</span><span class="p">)</span><span class="o">.</span><span class="n">Encode</span><span class="p">(</span><span class="n">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">Error</span><span class="w"> </span><span class="n">string</span><span class="w"> </span><span class="err">`</span><span class="n">json</span><span class="p">:</span><span class="s2">&quot;error&quot;</span><span class="err">`</span> <span class="w"> </span><span class="p">}{</span> <span class="w"> </span><span class="s2">&quot;Not the leader&quot;</span><span class="p">,</span> <span class="w"> </span><span class="p">})</span> <span class="w"> </span><span class="n">http</span><span class="o">.</span><span class="n">Error</span><span class="p">(</span><span class="n">w</span><span class="p">,</span><span class="w"> </span><span class="n">http</span><span class="o">.</span><span class="n">StatusText</span><span class="p">(</span><span class="n">http</span><span class="o">.</span><span class="n">StatusBadRequest</span><span class="p">),</span><span class="w"> </span><span class="n">http</span><span class="o">.</span><span class="n">StatusBadRequest</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">hs</span><span class="o">.</span><span class="n">r</span><span class="o">.</span><span class="n">AddVoter</span><span class="p">(</span><span class="n">raft</span><span class="o">.</span><span class="n">ServerID</span><span class="p">(</span><span class="n">followerId</span><span class="p">),</span><span class="w"> </span><span class="n">raft</span><span class="o">.</span><span class="n">ServerAddress</span><span class="p">(</span><span class="n">followerAddr</span><span class="p">),</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="o">.</span><span class="n">Error</span><span class="p">()</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">log</span><span class="o">.</span><span class="n">Printf</span><span class="p">(</span><span class="s2">&quot;Failed to add follower: </span><span class="si">%s</span><span class="s2">&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">err</span><span class="p">)</span> <span class="w"> </span><span class="n">http</span><span class="o">.</span><span class="n">Error</span><span class="p">(</span><span class="n">w</span><span class="p">,</span><span class="w"> </span><span class="n">http</span><span class="o">.</span><span class="n">StatusText</span><span class="p">(</span><span class="n">http</span><span class="o">.</span><span class="n">StatusBadRequest</span><span class="p">),</span><span class="w"> </span><span class="n">http</span><span class="o">.</span><span class="n">StatusBadRequest</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">w</span><span class="o">.</span><span class="n">WriteHeader</span><span class="p">(</span><span class="n">http</span><span class="o">.</span><span class="n">StatusOK</span><span class="p">)</span> <span class="p">}</span> </pre></div> <h4 id="key-value-storage">Key-value storage</h4><p>This part of the HTTP API exposes setting and getting.</p> <h5 id="set">Set</h5><p>Setting is where, instead of modifying the local database directly, we pass a message to the Raft cluster to store a log that contains the key and value.</p> <p>Since we read log messages in <code>kvFsm.Apply</code> and <code>kvFsm.Restore</code> as a JSON encoding of the <code>setPayload</code> struct we created, we must write log messages like so as well. Or, specifically in this case, we just expect that the user passes a JSON body that matches the <code>setPayload</code> struct.</p> <p>Then we call <code>Apply</code> on the Raft instance with the log message to get this process going.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">hs</span><span class="w"> </span><span class="nx">httpServer</span><span class="p">)</span><span class="w"> </span><span class="nx">setHandler</span><span class="p">(</span><span class="nx">w</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">ResponseWriter</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Request</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">Body</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span> <span class="w"> </span><span class="nx">bs</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">io</span><span class="p">.</span><span class="nx">ReadAll</span><span class="p">(</span><span class="nx">r</span><span class="p">.</span><span class="nx">Body</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">&quot;Could not read key-value in http request: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">Error</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusText</span><span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusBadRequest</span><span class="p">),</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusBadRequest</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">future</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">hs</span><span class="p">.</span><span class="nx">r</span><span class="p">.</span><span class="nx">Apply</span><span class="p">(</span><span class="nx">bs</span><span class="p">,</span><span class="w"> </span><span class="mi">500</span><span class="o">*</span><span class="nx">time</span><span class="p">.</span><span class="nx">Millisecond</span><span class="p">)</span> <span class="w"> </span><span class="c1">// Blocks until completion</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">future</span><span class="p">.</span><span class="nx">Error</span><span class="p">();</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">&quot;Could not write key-value: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">Error</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusText</span><span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusBadRequest</span><span class="p">),</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusBadRequest</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">e</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">future</span><span class="p">.</span><span class="nx">Response</span><span class="p">()</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">e</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">&quot;Could not write key-value, application: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">e</span><span class="p">)</span> <span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">Error</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusText</span><span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusBadRequest</span><span class="p">),</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusBadRequest</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">w</span><span class="p">.</span><span class="nx">WriteHeader</span><span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusOK</span><span class="p">)</span> <span class="p">}</span> </pre></div> <p class="note"> I'm not completely sure if `future.Response()` is supposed to be called from inside the `future.Error()` error block. You can <a href="https://pkg.go.dev/github.com/hashicorp/raft#ApplyFuture">see the docs</a> for yourself. </p><h5 id="get">Get</h5><p>If we wanted to be completely consistent we would need to pass a <code>read</code> message through to the Raft cluster and check its result for a key's value. We'd need to implement that <code>read</code> message in the state machine.</p> <p>But if we don't care strongly about consistency for reads we can just read the local in-memory store, skipping the Raft cluster.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">hs</span><span class="w"> </span><span class="nx">httpServer</span><span class="p">)</span><span class="w"> </span><span class="nx">getHandler</span><span class="p">(</span><span class="nx">w</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">ResponseWriter</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Request</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">key</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">URL</span><span class="p">.</span><span class="nx">Query</span><span class="p">().</span><span class="nx">Get</span><span class="p">(</span><span class="s">&quot;key&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">hs</span><span class="p">.</span><span class="nx">db</span><span class="p">.</span><span class="nx">Load</span><span class="p">(</span><span class="nx">key</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">&quot;&quot;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">rsp</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">Data</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="s">`json:&quot;data&quot;`</span> <span class="w"> </span><span class="p">}{</span><span class="nx">value</span><span class="p">.(</span><span class="kt">string</span><span class="p">)}</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">NewEncoder</span><span class="p">(</span><span class="nx">w</span><span class="p">).</span><span class="nx">Encode</span><span class="p">(</span><span class="nx">rsp</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">&quot;Could not encode key-value in http response: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">Error</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusText</span><span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusInternalServerError</span><span class="p">),</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusInternalServerError</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>And that's it for the server!</p> <h3 id="configuration">Configuration</h3><p>Let's throw together a quick helper for grabbing configuration from the CLI.</p> <p>When the process is started, each node must be configured with a Raft-level TCP address, a Raft-level unique node ID, and an HTTP address (for our application).</p> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">config</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">id</span><span class="w"> </span><span class="kt">string</span> <span class="w"> </span><span class="nx">httpPort</span><span class="w"> </span><span class="kt">string</span> <span class="w"> </span><span class="nx">raftPort</span><span class="w"> </span><span class="kt">string</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">getConfig</span><span class="p">()</span><span class="w"> </span><span class="nx">config</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">cfg</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">config</span><span class="p">{}</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">arg</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">arg</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;--node-id&quot;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">cfg</span><span class="p">.</span><span class="nx">id</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="nx">i</span><span class="o">+</span><span class="mi">2</span><span class="p">]</span> <span class="w"> </span><span class="nx">i</span><span class="o">++</span> <span class="w"> </span><span class="k">continue</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">arg</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;--http-port&quot;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">cfg</span><span class="p">.</span><span class="nx">httpPort</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="nx">i</span><span class="o">+</span><span class="mi">2</span><span class="p">]</span> <span class="w"> </span><span class="nx">i</span><span class="o">++</span> <span class="w"> </span><span class="k">continue</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">arg</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;--raft-port&quot;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">cfg</span><span class="p">.</span><span class="nx">raftPort</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="nx">i</span><span class="o">+</span><span class="mi">2</span><span class="p">]</span> <span class="w"> </span><span class="nx">i</span><span class="o">++</span> <span class="w"> </span><span class="k">continue</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">cfg</span><span class="p">.</span><span class="nx">id</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="s">&quot;Missing required parameter: --node-id&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">cfg</span><span class="p">.</span><span class="nx">raftPort</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="s">&quot;Missing required parameter: --raft-port&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">cfg</span><span class="p">.</span><span class="nx">httpPort</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="s">&quot;Missing required parameter: --http-port&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">cfg</span> <span class="p">}</span> </pre></div> <p>And finally, the <code>main()</code> that brings it all together.</p> <h3 id="main">main</h3><div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">cfg</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">getConfig</span><span class="p">()</span> <span class="w"> </span><span class="nx">db</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">sync</span><span class="p">.</span><span class="nx">Map</span><span class="p">{}</span> <span class="w"> </span><span class="nx">kf</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">kvFsm</span><span class="p">{</span><span class="nx">db</span><span class="p">}</span> <span class="w"> </span><span class="nx">dataDir</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="s">&quot;data&quot;</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">MkdirAll</span><span class="p">(</span><span class="nx">dataDir</span><span class="p">,</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">ModePerm</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatalf</span><span class="p">(</span><span class="s">&quot;Could not create data directory: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">setupRaft</span><span class="p">(</span><span class="nx">path</span><span class="p">.</span><span class="nx">Join</span><span class="p">(</span><span class="nx">dataDir</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;raft&quot;</span><span class="o">+</span><span class="nx">cfg</span><span class="p">.</span><span class="nx">id</span><span class="p">),</span><span class="w"> </span><span class="nx">cfg</span><span class="p">.</span><span class="nx">id</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;localhost:&quot;</span><span class="o">+</span><span class="nx">cfg</span><span class="p">.</span><span class="nx">raftPort</span><span class="p">,</span><span class="w"> </span><span class="nx">kf</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">hs</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">httpServer</span><span class="p">{</span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">db</span><span class="p">}</span> <span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">HandleFunc</span><span class="p">(</span><span class="s">&quot;/set&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">hs</span><span class="p">.</span><span class="nx">setHandler</span><span class="p">)</span> <span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">HandleFunc</span><span class="p">(</span><span class="s">&quot;/get&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">hs</span><span class="p">.</span><span class="nx">getHandler</span><span class="p">)</span> <span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">HandleFunc</span><span class="p">(</span><span class="s">&quot;/join&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">hs</span><span class="p">.</span><span class="nx">joinHandler</span><span class="p">)</span> <span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">ListenAndServe</span><span class="p">(</span><span class="s">&quot;:&quot;</span><span class="o">+</span><span class="nx">cfg</span><span class="p">.</span><span class="nx">httpPort</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span> <span class="p">}</span> </pre></div> <p>Build it.</p> <div class="highlight"><pre><span></span><span class="err">$</span><span class="w"> </span><span class="k">go</span><span class="w"> </span><span class="nx">mod</span><span class="w"> </span><span class="nx">init</span><span class="w"> </span><span class="nx">raft</span><span class="o">-</span><span class="nx">example</span> <span class="err">$</span><span class="w"> </span><span class="k">go</span><span class="w"> </span><span class="nx">mod</span><span class="w"> </span><span class="nx">tidy</span> <span class="err">$</span><span class="w"> </span><span class="k">go</span><span class="w"> </span><span class="nx">build</span> </pre></div> <p>And give it a shot. :)</p> <p>Terminal 1:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>./raft-example<span class="w"> </span>--node-id<span class="w"> </span>node1<span class="w"> </span>--raft-port<span class="w"> </span><span class="m">2222</span><span class="w"> </span>--http-port<span class="w"> </span><span class="m">8222</span> </pre></div> <p>Terminal 2:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>./raft-example<span class="w"> </span>--node-id<span class="w"> </span>node2<span class="w"> </span>--raft-port<span class="w"> </span><span class="m">2223</span><span class="w"> </span>--http-port<span class="w"> </span><span class="m">8223</span> </pre></div> <p>Terminal 3, tell 1 to have 2 follow it:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>curl<span class="w"> </span><span class="s1">&#39;localhost:8222/join?followerAddr=localhost:2223&amp;followerId=node2&#39;</span> </pre></div> <p>Terminal 3, now add a key:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>curl<span class="w"> </span>-X<span class="w"> </span>POST<span class="w"> </span><span class="s1">&#39;localhost:8222/set&#39;</span><span class="w"> </span>-d<span class="w"> </span><span class="s1">&#39;{&quot;key&quot;: &quot;x&quot;, &quot;value&quot;: &quot;23&quot;}&#39;</span><span class="w"> </span>-H<span class="w"> </span><span class="s1">&#39;content-type: application/json&#39;</span> </pre></div> <p>Terminal 3, now get the key from either server:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>curl<span class="w"> </span><span class="s1">&#39;localhost:8222/get?key=x&#39;</span> <span class="o">{</span><span class="s2">&quot;data&quot;</span>:<span class="s2">&quot;23&quot;</span><span class="o">}</span> $<span class="w"> </span>curl<span class="w"> </span><span class="s1">&#39;localhost:8223/get?key=x&#39;</span> <span class="o">{</span><span class="s2">&quot;data&quot;</span>:<span class="s2">&quot;23&quot;</span><span class="o">}</span> </pre></div> <p>And we're golden!</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Following up on that &quot;build a distributed postgres&quot; post I wanted to write down a shorter intro to building a stateful, distributed application using Hashicorp&#39;s Raft library.<br><br>So, here&#39;s a new blog post!<br><br>Also, try reading the Raft paper! It&#39;s not bad 😀<a href="https://t.co/C4S3uzxm0W">https://t.co/C4S3uzxm0W</a> <a href="https://t.co/L3Wwawe0UC">pic.twitter.com/L3Wwawe0UC</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1571662239559716865?ref_src=twsrc%5Etfw">September 19, 2022</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/minimal-key-value-store-with-hashicorp-raft.htmlSat, 17 Sep 2022 00:00:00 +0000What's the big deal about key-value databases like FoundationDB and RocksDB?http://notes.eatonphil.com/whats-the-big-deal-about-key-value-databases.html<p>Let's assume you're familiar with basic SQL databases like PostgreSQL and MySQL, and document databases like MongoDB and Elasticsearch. You probably know Redis too.</p> <p>But you're hearing more and more about embedded key-value stores like <a href="http://rocksdb.org/">RocksDB</a>, <a href="https://github.com/google/leveldb">LevelDB</a>, <a href="https://github.com/cockroachdb/pebble">PebbleDB</a>, and so on. And you're hearing about distributed key-value databases like <a href="https://www.foundationdb.org/">FoundationDB</a> and <a href="https://tikv.org/">TiKV</a>.</p> <p>What's the big deal? Aren't these just the equivalent of Redis or Java's ConcurrentHashMap?</p> <p>Let's take a look.</p> <h3 id="extensible-databases">Extensible databases</h3><p>Over the last 10 years or so (at least), databases have become more extensible. MySQL has around <a href="https://dev.mysql.com/doc/refman/8.0/en/storage-engines.html">10 different open-source storage engines</a>. More surely exist in the wild.</p> <p>Mongo supports <a href="https://www.mongodb.com/docs/manual/core/storage-engines/">multiple storage engines</a>. Relatively late, PostgreSQL version 12 added support for <a href="https://www.postgresql.org/docs/current/tableam.html">pluggable storage engines</a>.</p> <p class="note"> <a href="https://github.com/orioledb/orioledb">OrioleDB</a> and <a href="https://www.citusdata.com/blog/2021/03/06/citus-10-columnar-compression-for-postgres/">Citus 10's columnar compression</a> are two particularly interesting databases making use of PostgreSQL's pluggable storage engine. But since neither use an embedded key-value store, I won't talk about them more in this post. </p><p>And so on.</p> <h4 id="but-why?">But why?</h4><p>Swapping out storage engines allows you to tune the performance of your database. It can allow you to swap out row-oriented storage for column-oriented storage (useful for analytics workloads).</p> <p>It can allow you to swap B-Trees (traditional choice) for <a href="http://www.benstopford.com/2015/02/14/log-structured-merge-trees/">LSM Trees</a> (new hotness) as the underlying storage method (useful for optimizing write-heavy workloads).</p> <p>And since some storage engines themselves are built on distributed consensus (like <a href="https://github.com/apple/foundationdb">FoundationDB</a> and <a href="https://github.com/tikv/tikv">TiKV</a>), it may even allow you to turn a non-distributed database into a distributed database.</p> <h3 id="mapping-sql-to-key-value-storage">Mapping SQL to key-value storage</h3><p>But how the heck do you turn SQL, row-oriented data, into key-value data?</p> <p>CockroachDB is a SQL database built on <a href="https://www.cockroachlabs.com/blog/pebble-rocksdb-kv-store/">RocksDB originally and now their own LevelDB-inspired database</a> called <a href="https://github.com/cockroachdb/pebble">PebbleDB</a>.</p> <p>The reason I mention that here is because they maintain a great doc about <a href="https://github.com/cockroachdb/cockroach/blob/master/docs/tech-notes/encoding.md">their method of encoding rows to key-value form</a>.</p> <p>To simplify that doc though you can imagine mapping each row to a key-value form like this:</p> <div class="highlight"><pre><span></span><span class="nx">$</span><span class="p">{</span><span class="nx">TABLE_IDENTIFIER</span><span class="p">}</span><span class="nx">_$</span><span class="p">{</span><span class="nx">PRIMARY_KEY</span><span class="p">}</span><span class="nx">_$</span><span class="p">{</span><span class="nx">ROW_IDENTIFIER</span><span class="p">}</span><span class="o">:</span><span class="w"> </span><span class="nx">$</span><span class="p">{</span><span class="nx">ENCODED_VALUE</span><span class="p">}</span> </pre></div> <p>Embedded key-value stores almost always support efficient scanning of rows by a key-prefix. This means that you can efficiently grab all rows within a table by prefix-scanning on the table identifier. If you also include a primary key value along with the table identifier prefix, you get efficient primary key lookup.</p> <p>Even though the key space is flat.</p> <p>For the encoded value you can pick any encoding scheme; as inefficient as JSON or as efficient as some binary scheme like Protocol Buffers or Parquet.</p> <p class="note"> Thanks to <a href="https://twitter.com/justinjaffray">Justin Jaffray</a> for pointing me at the CockroachDB doc and confirming some of my thoughts on encoding strategies. </p><h4 id="tutorials">Tutorials</h4><p>I've written a couple of tutorials on building a database. They build on top of embedded key-value stores. If you're interested in seeing minimal code walkthroughs of how this process can work, check these posts out:</p> <ul> <li><a href="https://notes.eatonphil.com/distributed-postgres.html">Let's build a distributed Postgres proof of concept</a></li> <li><a href="https://notes.eatonphil.com/documentdb.html">Writing a document database from scratch in Go: Lucene-like filters and indexes</a></li> </ul> <h3 id="major-aspects-of-key-value-stores">Major aspects of key-value stores</h3><p>Now that you understand how a database can map to a key-value store, let's take a look at the particular properties that distinguish all these key-value stores from systems like Redis and Memcached.</p> <h4 id="reliable-storage">Reliable storage</h4><p>Maybe the single most important thing a storage system does is actual store data reliably. You can't just <code>open()</code> and <code>write()</code>. To quote Dan Luu, <a href="https://danluu.com/file-consistency/">files are hard</a>.</p> <p>Deferring storage correctness to a dedicated system means database developers can worry about other aspects of database development.</p> <h4 id="embeddable">Embeddable</h4><p>Along with reliable storage is the fact that the storage needs to run in process. Redis, for example, is not embeddable. There are many other things on top of the storage that need to happen in a high-level database and RPC calls between processes for storage is an unnecessary overhead.</p> <h4 id="efficient-prefix-scans">Efficient prefix scans</h4><p>As mentioned above, support for scans is pretty important for how indexes and namespaces (tables in SQL) get mapped to key-value queries.</p> <p>You shouldn't need to look through all table rows in the flat key space to find the rows for one table.</p> <h4 id="additional-aspects">Additional aspects</h4><p>The above isn't a complete list. Different stores provide different useful aspects like improved performance on certain workloads/in certain environments, builtin transactions, and so on.</p> <p>And sometimes it's helpful just to have an embedded store in your language rather than going through a foreign-function interface.</p> <h3 id="survey-of-databases-built-on-embedded-key-value-stores">Survey of databases built on embedded key-value stores</h3><p>Lastly, let's take a look at a few databases that build on top of embedded key-value stores.</p> <p>Note that some of them are not the primary version of the database (e.g. MyRocks vs MySQL, MongoRocks vs Mongo). Some of them are the primary version (e.g. CockroachDB, YugabyteDB).</p> <h4 id="document-databases-built-on-key-value-stores">Document databases built on key-value stores</h4><ul> <li><a href="https://www.percona.com/doc/percona-server-for-mongodb/3.4/mongorocks.html">MongoRocks</a> (Mongo on RocksDB)</li> </ul> <h4 id="sql-databases-built-on-key-value-stores">SQL databases built on key-value stores</h4><ul> <li><a href="http://myrocks.io/">MyRocks</a> (MySQL on RocksDB)</li> <li><a href="https://www.cockroachlabs.com">CockroachDB</a> (RocksDB originally, now their own PebbleDB)</li> <li><a href="https://www.yugabyte.com/blog/how-we-built-a-high-performance-document-store-on-rocksdb/">YugabyteDB</a> (on DocDB on RocksDB)</li> <li><a href="https://www.gridgain.com/resources/blog/apache-ignite-3-alpha-3-apache-calcite-raft-and-lsm-tree">Apache Ignite</a> (Calcite on RocksDB)</li> </ul> <h4 id="redis-compatible-databases-built-on-key-value-stores">Redis-compatible databases built on key-value stores</h4><ul> <li><a href="https://engineering.fb.com/2021/08/06/core-data/zippydb/">ZippyDB</a> (Redis-compatible database on RocksDB)</li> <li><a href="https://redis.com/blog/hood-redis-enterprise-flash-database-architecture/">Redis Enterprise Flash</a> (Redis on RocksDB)</li> </ul> <h4 id="other-databases-built-on-key-value-stores">Other databases built on key-value stores</h4><ul> <li><a href="https://thenewstack.io/instagram-supercharges-cassandra-pluggable-rocksdb-storage-engine/">Rocksandra</a> (Cassandra on RocksDB)</li> </ul> <p>Missing a database? Let me know!</p> <h4 id="separately,-distributed-key-value-stores">Separately, distributed key-value stores</h4><p>There is a different kind of key-value store that is a standalone app designed for distributed data. This list includes <a href="https://www.consul.io/">Consul</a>, <a href="https://etcd.io/docs/v3.4/learning/why/">etcd</a>, likely <a href="https://www.foundationdb.org/">FoundationDB</a>, and likely <a href="https://engineering.fb.com/2021/08/06/core-data/zippydb/">ZippyDB</a>. (There's a nice comparison table about some of these databases on the etcd page).</p> <p>These systems are designed to be used sort of like Redis except for that they are persistant and reliable stores. They are designed to always be up and always correct. For that reason they form the data storage backbone of core infrastructure like Kubernetes.</p> <p>It is possibly how <a href="https://www.snowflake.com/blog/how-foundationdb-powers-snowflake-metadata-forward/">Snowflake uses FoundationDB</a> but I'm not 100% sure.</p> <p>TiKV is not an embedded key-value database but it's not being used the same way etcd/Consul are as far as I can tell. It forms the backbone of <a href="https://en.pingcap.com/">TiDB</a>, an HTAP (hybrid OLAP/OLTP) SQL database.</p> <p>Maybe FoundationDB and TiKV deserve their own new category.</p> <p>But in general these databases have an RPC API that you communicate with over TCP. They are not generally embedded. You manage their process(es) separately.</p> <h3 id="conclusion">Conclusion</h3><p>So in this post we saw that databases are extensible. Storage engines are often swappable. Dedicated embedded key-value stores allow database developers to hand off data storage to a dedicated library. Different key-value stores have different performance characteristics that help developers and operators tune a database for their workload.</p> <p>Embedded key-value stores are a great foundation for all kinds of databases; SQL databases like CockroachDB, document databases like Mongo, wide-store databases like Cassandra, and caching databases like ZippyDB or Redis Enterprise Flash.</p> <p>This is a complex topic with many, many variations of systems. Hopefully this was a useful introduction.</p> <p>Overall if you're not a database developer and you're not running databases at a massive scale, you can probably ignore the details of the storage layer.</p> <p>Did I get something wrong? Or miss something important? Let me know. :)</p> <h3 id="corrections">Corrections</h3><ul> <li>An earlier version of this post suggested that FoundationDB was embedded. It is not. Thanks <a href="https://lobste.rs/s/avljlh/what_s_big_deal_about_embedded_key_value#c_rx0oid">adaszko on Lobsters for correcting</a>.</li> <li>An earlier version of this post suggested that TiKV was embedded. It is not. Thanks <a href="https://news.ycombinator.com/user?id=eis">eis on Hacker News</a>.</li> </ul> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">What&#39;s the big deal about embedded key-value databases like FoundationDB ands RocksDB?<br><br>I wrote a new blog post that might give a little context. :)<a href="https://t.co/kNFM1hVGx6">https://t.co/kNFM1hVGx6</a> <a href="https://t.co/H4SouStZHk">pic.twitter.com/H4SouStZHk</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1562106582544039937?ref_src=twsrc%5Etfw">August 23, 2022</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/whats-the-big-deal-about-key-value-databases.htmlTue, 23 Aug 2022 00:00:00 +0000SQLite has pretty limited builtin functionshttp://notes.eatonphil.com/2022-08-21-sqlite-limited-builtin-functions.html<head> <meta http-equiv="refresh" content="4;URL='https://datastation.multiprocess.io/blog/2022-08-21-sqlite-limited-builtin-functions.html'" /> </head><p>This is an external post of mine. Click <a href="https://datastation.multiprocess.io/blog/2022-08-21-sqlite-limited-builtin-functions.html">here</a> if you are not redirected.</p> http://notes.eatonphil.com/2022-08-21-sqlite-limited-builtin-functions.htmlSun, 21 Aug 2022 00:00:00 +0000Container scheduling strategies for integration testing 14 different databases in Github Actionshttp://notes.eatonphil.com/2022-07-25-database-integration-testing.html<head> <meta http-equiv="refresh" content="4;URL='https://datastation.multiprocess.io/blog/2022-07-25-database-integration-testing.html'" /> </head><p>This is an external post of mine. Click <a href="https://datastation.multiprocess.io/blog/2022-07-25-database-integration-testing.html">here</a> if you are not redirected.</p> http://notes.eatonphil.com/2022-07-25-database-integration-testing.htmlMon, 25 Jul 2022 00:00:00 +0000Implementing a simple jq clone in Go, and basics of Go memory profilinghttp://notes.eatonphil.com/implementing-a-jq-clone-in-go.html<p>In this post we'll build a basic jq clone in Go. It will only be able to pull a single path out of each object it reads. It won't be able to do filters, mapping, etc.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>large-file.json<span class="w"> </span><span class="p">|</span><span class="w"> </span>head<span class="w"> </span>-n2<span class="w"> </span><span class="p">|</span><span class="w"> </span>./jqgo<span class="w"> </span><span class="s1">&#39;.repo.url&#39;</span> <span class="s2">&quot;https://api.github.com/repos/petroav/6.828&quot;</span> <span class="s2">&quot;https://api.github.com/repos/rspt/rspt-theme&quot;</span> </pre></div> <p>We'll start by building a "control" implementation that uses Go's builtin JSON library with a JSON path tool on top.</p> <p>Then we'll implement a basic path-aware JSON parser in 600 lines of Go. It's going to use a technique (that may have a better name but) I call "partial parsing" or "fuzzy parsing" where we fully parse what we care about and only <em>sort of</em> parse the rest.</p> <p>Why partial parsing? There are two general reasons. One is to use less memory than parsers that must always turn all of a text into an object in your language. The other is for when the language has complexities you don't want or need to deal with. We'll basically have to deal with all the complexities of JSON so this post is about the former reason: using less memory. I've written about a case for the second reason though in <a href="https://datastation.multiprocess.io/blog/2021-10-31-building-a-nested-css-rule-expander.html">building a simple, fast SCSS implementation</a>.</p> <p class="note"> This partial parser is more complex than a typical handwritten parser. If you are unfamiliar with handwritten JSON parsers, you may want to take a look at <a href="https://notes.eatonphil.com/tags/json.html">previous articles</a> I've written about parsing JSON. </p><p>Once we get this partial parser working we'll turn to Go's builtin profiler to find what we can do to make it faster.</p> <p>All code for this post is <a href="https://github.com/eatonphil/jqgo">available on Github</a>.</p> <h3 id="machine-specs,-versions">Machine specs, versions</h3><p>Since we're going to be doing some rudimentary comparisons of performance, here are my details. I am running everything on a dedicated server, <a href="https://us.ovhcloud.com/bare-metal/rise/rise-1/">OVH Rise-1</a>.</p> <ul> <li>RAM: 64 GB DDR4 ECC 2,133 MHz</li> <li>Disk: 2x450 GB SSD NVMe in Soft RAID</li> <li>Processor: Intel Xeon E3-1230v6 - 4c/8t - 3.5 GHz/3.9 GHz</li> </ul> <p>And relevant versions:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>jq<span class="w"> </span>--version jq-1.6 $<span class="w"> </span>go<span class="w"> </span>version go<span class="w"> </span>version<span class="w"> </span>go1.18<span class="w"> </span>linux/amd64 $<span class="w"> </span>uname<span class="w"> </span>-a Linux<span class="w"> </span>phil<span class="w"> </span><span class="m">5</span>.18.10-100.fc35.x86_64<span class="w"> </span><span class="c1">#1 SMP PREEMPT_DYNAMIC Thu Jul 7 17:41:37 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux</span> </pre></div> <p>Now buckle up!</p> <h3 id="jq-using-go's-builtin-json-library">jq using Go's builtin JSON library</h3><p>This is a very simple program. We just parse JSON data from stdin in a loop. And after parsing each time we'll call a <code>extractValueAtPath</code> function to grab the value at the path the user asks for.</p> <p>To keep our path "parser" very simple we'll treat array access the same as object access. So we'll look for <code>x.0</code> instead of <code>x[0]</code>, unlike jq.</p> <div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span> <span class="kn">import</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="s">&quot;encoding/json&quot;</span> <span class="w"> </span><span class="s">&quot;io&quot;</span> <span class="w"> </span><span class="s">&quot;log&quot;</span> <span class="w"> </span><span class="s">&quot;os&quot;</span> <span class="w"> </span><span class="s">&quot;strconv&quot;</span> <span class="w"> </span><span class="s">&quot;strings&quot;</span> <span class="p">)</span> <span class="kd">func</span><span class="w"> </span><span class="nx">extractValueAtPath</span><span class="p">(</span><span class="nx">a</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span><span class="p">,</span><span class="w"> </span><span class="nx">path</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="kt">any</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// TODO</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">path</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">Split</span><span class="p">(</span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="w"> </span><span class="s">&quot;.&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">path</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">path</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">path</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">dec</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">NewDecoder</span><span class="p">(</span><span class="nx">os</span><span class="p">.</span><span class="nx">Stdin</span><span class="p">)</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">a</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span> <span class="w"> </span><span class="nx">enc</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">NewEncoder</span><span class="p">(</span><span class="nx">os</span><span class="p">.</span><span class="nx">Stdout</span><span class="p">)</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">dec</span><span class="p">.</span><span class="nx">Decode</span><span class="p">(</span><span class="o">&amp;</span><span class="nx">a</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">io</span><span class="p">.</span><span class="nx">EOF</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">break</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">v</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">extractValueAtPath</span><span class="p">(</span><span class="nx">a</span><span class="p">,</span><span class="w"> </span><span class="nx">path</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">enc</span><span class="p">.</span><span class="nx">Encode</span><span class="p">(</span><span class="nx">v</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>Then we implement the <code>extractValueAtPath</code> function itself, entering into JSON arrays and objects until we reach the end of the path.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">extractValueAtPath</span><span class="p">(</span><span class="nx">a</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span><span class="p">,</span><span class="w"> </span><span class="nx">path</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="kt">any</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">path</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">v</span><span class="w"> </span><span class="kt">any</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">a</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">part</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">path</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">arr</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">v</span><span class="p">.([]</span><span class="kt">any</span><span class="p">);</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">n</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">strconv</span><span class="p">.</span><span class="nx">Atoi</span><span class="p">(</span><span class="nx">part</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">v</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">arr</span><span class="p">[</span><span class="nx">n</span><span class="p">]</span> <span class="w"> </span><span class="k">continue</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">m</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">v</span><span class="p">.(</span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Path into a non-map</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">v</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">m</span><span class="p">[</span><span class="nx">part</span><span class="p">]</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Path does not exist</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">v</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <p>Alright, let's give it a go module and build and run it!</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>mod<span class="w"> </span>init<span class="w"> </span>control $<span class="w"> </span>go<span class="w"> </span>mod<span class="w"> </span>tidy $<span class="w"> </span>go<span class="w"> </span>build <span class="c1"># Grab a test file</span> $<span class="w"> </span>curl<span class="w"> </span>https://raw.githubusercontent.com/json-iterator/test-data/master/large-file.json<span class="w"> </span><span class="p">|</span><span class="w"> </span>jq<span class="w"> </span>-c<span class="w"> </span><span class="s1">&#39;.[]&#39;</span><span class="w"> </span>&gt;<span class="w"> </span>large-file.json $<span class="w"> </span>cat<span class="w"> </span>large-file.json<span class="w"> </span><span class="p">|</span><span class="w"> </span>head<span class="w"> </span>-n2<span class="w"> </span><span class="p">|</span><span class="w"> </span>./control<span class="w"> </span><span class="s1">&#39;.repo.url&#39;</span> <span class="s2">&quot;https://api.github.com/repos/petroav/6.828&quot;</span> <span class="s2">&quot;https://api.github.com/repos/rspt/rspt-theme&quot;</span> </pre></div> <p>Sweet. Now let's make sure it produces the same thing as jq.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>large-file.json<span class="w"> </span><span class="p">|</span><span class="w"> </span>./control<span class="w"> </span><span class="s1">&#39;.repo.url&#39;</span><span class="w"> </span>&gt;<span class="w"> </span>control.test $<span class="w"> </span>cat<span class="w"> </span>large-file.json<span class="w"> </span><span class="p">|</span><span class="w"> </span>jq<span class="w"> </span><span class="s1">&#39;.repo.url&#39;</span><span class="w"> </span>&gt;<span class="w"> </span>jq.test $<span class="w"> </span>diff<span class="w"> </span>jq.test<span class="w"> </span>control.test $<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="nv">$?</span> <span class="m">0</span> </pre></div> <p>Great! It's working for a basic query. Let's see how it performs.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>hyperfine<span class="w"> </span><span class="se">\</span> <span class="w"> </span><span class="s2">&quot;cat large-file.json | ./control &#39;.repo.url&#39; &gt; control.test&quot;</span><span class="w"> </span><span class="se">\</span> <span class="w"> </span><span class="s2">&quot;cat large-file.json | jq &#39;.repo.url&#39; &gt; jq.test&quot;</span> Benchmark<span class="w"> </span><span class="m">1</span>:<span class="w"> </span>cat<span class="w"> </span>large-file.json<span class="w"> </span><span class="p">|</span><span class="w"> </span>./control<span class="w"> </span><span class="s1">&#39;.repo.url&#39;</span><span class="w"> </span>&gt;<span class="w"> </span>control.test <span class="w"> </span>Time<span class="w"> </span><span class="o">(</span>mean<span class="w"> </span>±<span class="w"> </span>σ<span class="o">)</span>:<span class="w"> </span><span class="m">310</span>.0<span class="w"> </span>ms<span class="w"> </span>±<span class="w"> </span><span class="m">14</span>.4<span class="w"> </span>ms<span class="w"> </span><span class="o">[</span>User:<span class="w"> </span><span class="m">296</span>.2<span class="w"> </span>ms,<span class="w"> </span>System:<span class="w"> </span><span class="m">49</span>.3<span class="w"> </span>ms<span class="o">]</span> <span class="w"> </span>Range<span class="w"> </span><span class="o">(</span>min<span class="w"> </span>…<span class="w"> </span>max<span class="o">)</span>:<span class="w"> </span><span class="m">296</span>.1<span class="w"> </span>ms<span class="w"> </span>…<span class="w"> </span><span class="m">344</span>.9<span class="w"> </span>ms<span class="w"> </span><span class="m">10</span><span class="w"> </span>runs Benchmark<span class="w"> </span><span class="m">2</span>:<span class="w"> </span>cat<span class="w"> </span>large-file.json<span class="w"> </span><span class="p">|</span><span class="w"> </span>jq<span class="w"> </span><span class="s1">&#39;.repo.url&#39;</span><span class="w"> </span>&gt;<span class="w"> </span>jq.test <span class="w"> </span>Time<span class="w"> </span><span class="o">(</span>mean<span class="w"> </span>±<span class="w"> </span>σ<span class="o">)</span>:<span class="w"> </span><span class="m">355</span>.8<span class="w"> </span>ms<span class="w"> </span>±<span class="w"> </span><span class="m">1</span>.1<span class="w"> </span>ms<span class="w"> </span><span class="o">[</span>User:<span class="w"> </span><span class="m">348</span>.8<span class="w"> </span>ms,<span class="w"> </span>System:<span class="w"> </span><span class="m">27</span>.7<span class="w"> </span>ms<span class="o">]</span> <span class="w"> </span>Range<span class="w"> </span><span class="o">(</span>min<span class="w"> </span>…<span class="w"> </span>max<span class="o">)</span>:<span class="w"> </span><span class="m">354</span>.8<span class="w"> </span>ms<span class="w"> </span>…<span class="w"> </span><span class="m">358</span>.5<span class="w"> </span>ms<span class="w"> </span><span class="m">10</span><span class="w"> </span>runs Summary <span class="w"> </span><span class="s1">&#39;cat large-file.json | ./control &#39;</span>.repo.url<span class="s1">&#39; &gt; control.test&#39;</span><span class="w"> </span>ran <span class="w"> </span><span class="m">1</span>.15<span class="w"> </span>±<span class="w"> </span><span class="m">0</span>.05<span class="w"> </span><span class="nb">times</span><span class="w"> </span>faster<span class="w"> </span>than<span class="w"> </span><span class="s1">&#39;cat large-file.json | jq &#39;</span>.repo.url<span class="s1">&#39; &gt; jq.test&#39;</span> </pre></div> <p>Now that's surprising! This naive implementation in Go is a bit faster than standard jq. But our implementation supports a heck of a lot less than jq. So this benchmark on its own isn't incredibly meaningful.</p> <p>However, it's a good base for comparing to our next implementation.</p> <p class="note"> Astute readers may notice that this version doesn't use a buffered reader from stdin, while the next version will. I tried this version with and without wrapping stdin in a buffered reader but it didn't make a meaningful difference. It might be because Go's JSON decoder does its own buffering. I'm not sure. </p><p>Let's do the fun implementation.</p> <h3 id="partial-parsing">Partial parsing</h3><p>Unlike a typical handwritten parser this partial parser is going to contain almost two parsers. One parser will care exactly about the structure of JSON. The other parser will only care about reading past the current value (whether it be a number or string or array or object, etc.) The path we pass to the parser will be used to decide whether each value should be fully parsed or partially parsed.</p> <p class="note"> I'll reiterate: this partial parser is more complex than a typical handwritten parser. If you are unfamiliar with handwritten JSON parsers, you may want to take a look at <a href="https://notes.eatonphil.com/tags/json.html">previous articles</a> I've written about parsing JSON. </p><p>The shell of this partial parser is going to look similar to the shell of the first parser.</p> <div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span> <span class="kn">import</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="s">&quot;bufio&quot;</span> <span class="w"> </span><span class="s">&quot;encoding/json&quot;</span> <span class="w"> </span><span class="s">&quot;fmt&quot;</span> <span class="w"> </span><span class="s">&quot;io&quot;</span> <span class="w"> </span><span class="s">&quot;log&quot;</span> <span class="w"> </span><span class="s">&quot;os&quot;</span> <span class="w"> </span><span class="s">&quot;strconv&quot;</span> <span class="w"> </span><span class="s">&quot;strings&quot;</span> <span class="p">)</span> <span class="kd">type</span><span class="w"> </span><span class="nx">jsonReader</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">read</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span> <span class="p">}</span> <span class="o">...</span><span class="w"> </span><span class="nx">TO</span><span class="w"> </span><span class="nx">IMPLEMENT</span><span class="w"> </span><span class="o">...</span> <span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">path</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">Split</span><span class="p">(</span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="w"> </span><span class="s">&quot;.&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">path</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">path</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">path</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">bufio</span><span class="p">.</span><span class="nx">NewReader</span><span class="p">(</span><span class="nx">os</span><span class="p">.</span><span class="nx">Stdin</span><span class="p">)</span> <span class="w"> </span><span class="nx">enc</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">NewEncoder</span><span class="p">(</span><span class="nx">os</span><span class="p">.</span><span class="nx">Stdout</span><span class="p">)</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">jr</span><span class="w"> </span><span class="nx">jsonReader</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">val</span><span class="w"> </span><span class="kt">any</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="kt">error</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">reset</span><span class="p">()</span> <span class="w"> </span><span class="nx">val</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">extractDataFromJsonPath</span><span class="p">(</span><span class="nx">b</span><span class="p">,</span><span class="w"> </span><span class="nx">path</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">io</span><span class="p">.</span><span class="nx">EOF</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">break</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="s">&quot;Read&quot;</span><span class="p">,</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">jr</span><span class="p">.</span><span class="nx">read</span><span class="p">))</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatalln</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">enc</span><span class="p">.</span><span class="nx">Encode</span><span class="p">(</span><span class="nx">val</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatalln</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>Except instead of using the builtin JSON parser we'll call our own <code>extractDataFromJsonPath</code> function that handles parsing and extraction all at once.</p> <p>Before doing that we'll add a few helper functions. The first one grabs a byte from a reader and stores the read byte locally (so we can print out all read bytes if the program fails).</p> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">jsonReader</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">read</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">jr</span><span class="w"> </span><span class="o">*</span><span class="nx">jsonReader</span><span class="p">)</span><span class="w"> </span><span class="nx">readByte</span><span class="p">(</span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">bufio</span><span class="p">.</span><span class="nx">Reader</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">c</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">ReadByte</span><span class="p">()</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">byte</span><span class="p">(</span><span class="mi">0</span><span class="p">),</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">read</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">jr</span><span class="p">.</span><span class="nx">read</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">c</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <p>The <code>reset</code> member zeroes out the <code>read</code> bytes and gets called before each object is parsed in the <code>main</code> main loop.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">jr</span><span class="w"> </span><span class="o">*</span><span class="nx">jsonReader</span><span class="p">)</span><span class="w"> </span><span class="nx">reset</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">read</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <p>Now let's get into <code>extractDataFromJsonPath</code>.</p> <h3 id="extractdatafromjsonpath">extractDataFromJsonPath</h3><p>This is the real parser. It expects a JSON object and fully parses the object, almost.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">jr</span><span class="w"> </span><span class="o">*</span><span class="nx">jsonReader</span><span class="p">)</span><span class="w"> </span><span class="nx">extractDataFromJsonPath</span><span class="p">(</span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">bufio</span><span class="p">.</span><span class="nx">Reader</span><span class="p">,</span><span class="w"> </span><span class="nx">path</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="kt">any</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">path</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">eatWhitespace</span><span class="p">(</span><span class="nx">r</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">b</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">readByte</span><span class="p">(</span><span class="nx">r</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Make sure we&#39;re actually going into an object</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="sc">&#39;{&#39;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Expected opening curly brace, got: &#39;%s&#39;&quot;</span><span class="p">,</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">b</span><span class="p">))</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="o">-</span><span class="mi">1</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">result</span><span class="w"> </span><span class="kt">any</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">i</span><span class="o">++</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">eatWhitespace</span><span class="p">(</span><span class="nx">r</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">bs</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">Peek</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">bs</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="w"> </span><span class="c1">// We found the end of the object</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;}&#39;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">Discard</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="w"> </span><span class="k">break</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Key-value pairs must be separated by commas</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p">&gt;</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;,&#39;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">Discard</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Expected comma between key-value pairs, got: &#39;%s&#39;&quot;</span><span class="p">,</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">b</span><span class="p">))</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">eatWhitespace</span><span class="p">(</span><span class="nx">r</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Grab the key</span> <span class="w"> </span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">expectString</span><span class="p">(</span><span class="nx">r</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">eatWhitespace</span><span class="p">(</span><span class="nx">r</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Find a colon separating key from value</span> <span class="w"> </span><span class="nx">b</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">readByte</span><span class="p">(</span><span class="nx">r</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="sc">&#39;:&#39;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Expected colon, got: &#39;%s&#39;&quot;</span><span class="p">,</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">b</span><span class="p">))</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">eatWhitespace</span><span class="p">(</span><span class="nx">r</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Up to this point it looks like any old handwritten parser. There are a few helpers in there (<code>eatWhitespace</code>, <code>expectString</code>) we'll implement shortly.</p> <p>But once we see each key and are ready to look for a value we can decide if we need to fully parse the value (if the path goes into this key) or if we can partially parse the value (because the path does not go into this key).</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="c1">// If the key is not the start of this path, skip past this value</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">path</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">eatValue</span><span class="p">(</span><span class="nx">r</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">continue</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Otherwise this is a path we want, grab the value</span> <span class="w"> </span><span class="nx">result</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">expectValue</span><span class="p">(</span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">path</span><span class="p">[</span><span class="mi">1</span><span class="p">:])</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">result</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <p>And that's it! The core parsing loop is done. The meat now becomes 1) the <code>eatValue</code> function that partially parses JSON and 2) the <code>expectValue</code> function that either encounters a scalar value and returns it or recursively calls <code>extractDataFromJsonPath</code> to enter some new object.</p> <h4 id="notes-on-helper-naming">Notes on helper naming</h4><p>There are three main kinds of helpers you'll see. <code>expectX</code> helpers like <code>expectString</code> will return early with an error if they fail to find what they're looking for. <code>eatX</code> helpers like <code>eatWhitespace</code> will not return any value and will only move the read cursor forward. And <code>tryX</code> helpers like <code>tryNumber</code> will do the same thing as <code>expectString</code> but return an additional boolean argument. So the caller can decide whether or not to make other attempts at parsing.</p> <p>But first let's fill in the two helpers we skipped. First off, <code>eatWhitespace</code>.</p> <h3 id="eatwhitespace">eatWhitespace</h3><p>This function peeks and reads bytes while the bytes are whitespace.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">jr</span><span class="w"> </span><span class="o">*</span><span class="nx">jsonReader</span><span class="p">)</span><span class="w"> </span><span class="nx">eatWhitespace</span><span class="p">(</span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">bufio</span><span class="p">.</span><span class="nx">Reader</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">bs</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">Peek</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">bs</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="w"> </span><span class="nx">isWhitespace</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39; &#39;</span><span class="w"> </span><span class="o">||</span> <span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;\n&#39;</span><span class="w"> </span><span class="o">||</span> <span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;\t&#39;</span><span class="w"> </span><span class="o">||</span> <span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;\r&#39;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">isWhitespace</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">Discard</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>That's it! Next we need to fill in <code>expectString</code>.</p> <h3 id="expectstring">expectString</h3><p>This is a standard handwritten parser helper that looks for a double quote and keeps collecting bytes until it finds an ending double quote that is not escaped.</p> <div class="highlight"><pre><span></span><span class="k">func</span><span class="w"> </span><span class="p">(</span><span class="n">jr</span><span class="w"> </span><span class="o">*</span><span class="n">jsonReader</span><span class="p">)</span><span class="w"> </span><span class="n">expectString</span><span class="p">(</span><span class="n">r</span><span class="w"> </span><span class="o">*</span><span class="n">bufio</span><span class="o">.</span><span class="n">Reader</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">string</span><span class="p">,</span><span class="w"> </span><span class="n">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">var</span><span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="p">[]</span><span class="n">byte</span> <span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">jr</span><span class="o">.</span><span class="n">eatWhitespace</span><span class="p">(</span><span class="n">r</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s2">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Look</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">opening</span><span class="w"> </span><span class="n">quote</span> <span class="w"> </span><span class="n">b</span><span class="p">,</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">jr</span><span class="o">.</span><span class="n">readByte</span><span class="p">(</span><span class="n">r</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s2">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="s1">&#39;&quot;&#39;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s2">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">fmt</span><span class="o">.</span><span class="n">Errorf</span><span class="p">(</span><span class="s2">&quot;Expected double quote to start string, got: &#39;</span><span class="si">%s</span><span class="s2">&#39;&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">string</span><span class="p">(</span><span class="n">b</span><span class="p">))</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">var</span><span class="w"> </span><span class="n">prev</span><span class="w"> </span><span class="n">byte</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">b</span><span class="p">,</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">jr</span><span class="o">.</span><span class="n">readByte</span><span class="p">(</span><span class="n">r</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s2">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s1">&#39;</span><span class="se">\\</span><span class="s1">&#39;</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="n">prev</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s1">&#39;</span><span class="se">\\</span><span class="s1">&#39;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Just</span><span class="w"> </span><span class="n">skip</span> <span class="w"> </span><span class="n">prev</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">byte</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="w"> </span><span class="k">continue</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s1">&#39;&quot;&#39;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Overwrite</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">escaped</span><span class="w"> </span><span class="n">double</span><span class="w"> </span><span class="n">quote</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">prev</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s1">&#39;</span><span class="se">\\</span><span class="s1">&#39;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">s</span><span class="p">[</span><span class="n">len</span><span class="p">(</span><span class="n">s</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;&quot;&#39;</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Otherwise</span><span class="w"> </span><span class="n">it</span><span class="s1">&#39;s the actual end</span> <span class="w"> </span><span class="k">break</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">append</span><span class="p">(</span><span class="n">s</span><span class="p">,</span><span class="w"> </span><span class="n">b</span><span class="p">)</span> <span class="w"> </span><span class="n">prev</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">b</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">string</span><span class="p">(</span><span class="n">s</span><span class="p">),</span><span class="w"> </span><span class="n">nil</span> <span class="p">}</span> </pre></div> <p>Standard stuff! Now let's get back to those meaty functions we introduced before, starting with <code>expectValue</code>.</p> <h3 id="expectvalue">expectValue</h3><p>This function is called by <code>extractDataFromJsonPath</code> when it wants to fully parse a value.</p> <p>If we see a left curly brace, we call <code>extractDataFromJsonPath</code> with it.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">jr</span><span class="w"> </span><span class="o">*</span><span class="nx">jsonReader</span><span class="p">)</span><span class="w"> </span><span class="nx">expectValue</span><span class="p">(</span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">bufio</span><span class="p">.</span><span class="nx">Reader</span><span class="p">,</span><span class="w"> </span><span class="nx">path</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="kt">any</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">bs</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">Peek</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">bs</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;{&#39;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">extractDataFromJsonPath</span><span class="p">(</span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">path</span><span class="p">)</span> </pre></div> <p>Otherwise if we see a left bracket we call a new helper <code>extractArrayDataFromJsonPath</code> which will be almost identical to <code>extractDataFromJsonPath</code> but for parsing array syntax.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;[&#39;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">extractArrayDataFromJsonPath</span><span class="p">(</span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">path</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>If the value we're trying to parse isn't an array or object and there's more of a path then we have to return null because we can't enter into a scalar value.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="c1">// Can&#39;t go any further into a path</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">len</span><span class="p">(</span><span class="n">path</span><span class="p">)</span><span class="w"> </span>!<span class="p">=</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Reached the end of this object but more of</span> <span class="w"> </span><span class="c1">// the path remains. So this object doesn&#39;t</span> <span class="w"> </span><span class="c1">// contain this path.</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">nil</span><span class="p">,</span><span class="w"> </span><span class="n">nil</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Then we try to parse a scalar (numbers, strings, booleans, <code>null</code>) and ultimately return an error if nothing worked.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">ok</span><span class="p">,</span><span class="w"> </span><span class="nx">val</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">tryScalar</span><span class="p">(</span><span class="nx">r</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Expected scalar, got: &#39;%s&#39;&quot;</span><span class="p">,</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">c</span><span class="p">))</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">val</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="p">}</span> </pre></div> <p>Let's implement <code>tryScalar</code> and its dependencies now. And we'll come back to <code>extractArrayDataFromJsonPath</code> afterward.</p> <h3 id="tryscalar">tryScalar</h3><p>The <code>tryScalar</code> is similar to <code>expectValue</code>. It's called <code>tryScalar</code> because it's allowed to fail.</p> <p>We peek at the first byte and switch on a dedicated parsing helper based on it.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">jr</span><span class="w"> </span><span class="o">*</span><span class="nx">jsonReader</span><span class="p">)</span><span class="w"> </span><span class="nx">tryScalar</span><span class="p">(</span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">bufio</span><span class="p">.</span><span class="nx">Reader</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="kt">bool</span><span class="p">,</span><span class="w"> </span><span class="kt">any</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">bs</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">Peek</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">false</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">bs</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;&quot;&#39;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">val</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">expectString</span><span class="p">(</span><span class="nx">r</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">val</span><span class="p">),</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;t&#39;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">val</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">expectIdentifier</span><span class="p">(</span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;true&quot;</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span><span class="w"> </span><span class="nx">val</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;f&#39;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">val</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">expectIdentifier</span><span class="p">(</span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;false&quot;</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span><span class="w"> </span><span class="nx">val</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;n&#39;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">val</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">expectIdentifier</span><span class="p">(</span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;null&quot;</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span><span class="w"> </span><span class="nx">val</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">tryNumber</span><span class="p">(</span><span class="nx">r</span><span class="p">)</span> <span class="p">}</span> </pre></div> <p>This passes control flow to two new functions, <code>expectIdentifier</code> and <code>tryNumber</code>. Let's do <code>expectIdentifier</code> next.</p> <h3 id="expectidentifier">expectIdentifier</h3><p>This function tries to match the reader on a string passed to it.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">jr</span><span class="w"> </span><span class="o">*</span><span class="nx">jsonReader</span><span class="p">)</span><span class="w"> </span><span class="nx">expectIdentifier</span><span class="p">(</span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">bufio</span><span class="p">.</span><span class="nx">Reader</span><span class="p">,</span><span class="w"> </span><span class="nx">ident</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="kt">any</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="kt">any</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">ident</span><span class="p">);</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">b</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">ReadByte</span><span class="p">()</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">s</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">ident</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Unknown value: &#39;%s&#39;&quot;</span><span class="p">,</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">s</span><span class="p">))</span> <span class="p">}</span> </pre></div> <p class="note"> Thanks <a href="https://twitter.com/deliberatecoder">Michael Lynch</a> for pointing out in an earlier version that <code>expectIdentifier</code> does not need to <code>Peek</code>/<code>Discard</code> but can just <code>ReadByte</code> instead. </p><h3 id="trynumber">tryNumber</h3><p>This function tries to parse a number. We'll do a very lazy number parser that will <em>most likely</em> allow all valid numbers. Internally we'll call <code>json.Unmarshal</code> on the bytes we build up to do the conversion itself.</p> <div class="highlight"><pre><span></span><span class="k">func</span><span class="w"> </span><span class="p">(</span><span class="n">jr</span><span class="w"> </span><span class="o">*</span><span class="n">jsonReader</span><span class="p">)</span><span class="w"> </span><span class="n">tryNumber</span><span class="p">(</span><span class="n">r</span><span class="w"> </span><span class="o">*</span><span class="n">bufio</span><span class="o">.</span><span class="n">Reader</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="nb nb-Type">bool</span><span class="p">,</span><span class="w"> </span><span class="n">any</span><span class="p">,</span><span class="w"> </span><span class="n">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">var</span><span class="w"> </span><span class="n">number</span><span class="w"> </span><span class="p">[]</span><span class="n">byte</span> <span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Loop</span><span class="w"> </span><span class="n">trying</span><span class="w"> </span><span class="n">to</span><span class="w"> </span><span class="n">find</span><span class="w"> </span><span class="n">all</span><span class="w"> </span><span class="n">number</span><span class="o">-</span><span class="n">like</span><span class="w"> </span><span class="n">characters</span><span class="w"> </span><span class="ow">in</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">row</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">bs</span><span class="p">,</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">r</span><span class="o">.</span><span class="n">Peek</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="bp">false</span><span class="p">,</span><span class="w"> </span><span class="n">nil</span><span class="p">,</span><span class="w"> </span><span class="n">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">bs</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="w"> </span><span class="n">isNumberCharacter</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">c</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="s1">&#39;0&#39;</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="s1">&#39;9&#39;</span><span class="p">)</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s1">&#39;e&#39;</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s1">&#39;-&#39;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="o">!</span><span class="n">isNumberCharacter</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">break</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">number</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">append</span><span class="p">(</span><span class="n">number</span><span class="p">,</span><span class="w"> </span><span class="n">c</span><span class="p">)</span> <span class="w"> </span><span class="n">r</span><span class="o">.</span><span class="n">Discard</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">len</span><span class="p">(</span><span class="n">number</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="bp">false</span><span class="p">,</span><span class="w"> </span><span class="n">nil</span><span class="p">,</span><span class="w"> </span><span class="n">nil</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">var</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="n">float64</span> <span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">json</span><span class="o">.</span><span class="n">Unmarshal</span><span class="p">(</span><span class="n">number</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="n">n</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="bp">true</span><span class="p">,</span><span class="w"> </span><span class="n">n</span><span class="p">,</span><span class="w"> </span><span class="n">err</span> <span class="p">}</span> </pre></div> <p>If we can't find a number, that's ok. We'll just say so in the first argument by returning <code>false</code>.</p> <h3 id="outstanding-functions">Outstanding functions</h3><p>Ok we've come a while building out helper functions. The last two remaining helpers are <code>extractArrayDataFromJsonPath</code> and <code>eatValue</code>. Let's finish up these real parser functions before getting to <code>eatValue</code>, the primary partial parsing function.</p> <h3 id="extractarraydatafromjsonpath">extractArrayDataFromJsonPath</h3><p>This function is almost identical to <code>extractDataFromJsonPath</code> but rather than parsing key-value pairs inside curly braces it parses values inside brackets.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">jr</span><span class="w"> </span><span class="o">*</span><span class="nx">jsonReader</span><span class="p">)</span><span class="w"> </span><span class="nx">extractArrayDataFromJsonPath</span><span class="p">(</span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">bufio</span><span class="p">.</span><span class="nx">Reader</span><span class="p">,</span><span class="w"> </span><span class="nx">path</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="kt">any</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Path inside an array must be an integer</span> <span class="w"> </span><span class="nx">n</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">strconv</span><span class="p">.</span><span class="nx">Atoi</span><span class="p">(</span><span class="nb">string</span><span class="p">(</span><span class="nx">path</span><span class="p">[</span><span class="mi">0</span><span class="p">]))</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Look for opening bracket. Make sure we&#39;re in an array</span> <span class="w"> </span><span class="nx">b</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">readByte</span><span class="p">(</span><span class="nx">r</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="sc">&#39;[&#39;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Expected opening bracket, got: &#39;%s&#39;&quot;</span><span class="p">,</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">b</span><span class="p">))</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">result</span><span class="w"> </span><span class="kt">any</span> <span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="o">-</span><span class="mi">1</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">i</span><span class="o">++</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">eatWhitespace</span><span class="p">(</span><span class="nx">r</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">bs</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">Peek</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">bs</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="w"> </span><span class="c1">// Found closing bracket, exit the array</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;]&#39;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">Discard</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="w"> </span><span class="k">break</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Array values must be separated by a comma</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p">&gt;</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;,&#39;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">Discard</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Expected comma between key-value pairs, got: &#39;%s&#39;&quot;</span><span class="p">,</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">b</span><span class="p">))</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">eatWhitespace</span><span class="p">(</span><span class="nx">r</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Just like <code>extractDataFromJsonPath</code> it either calls <code>eatValue</code> or <code>expectValue</code> depending on whether the current index matches the requested path.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="c1">// If the key is not the start of this path, skip past this value</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">n</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">eatValue</span><span class="p">(</span><span class="nx">r</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">continue</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">result</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">expectValue</span><span class="p">(</span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">path</span><span class="p">[</span><span class="mi">1</span><span class="p">:])</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">result</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <p>That's it for full parser functions! Let's do the partial parser, <code>eatValue</code>.</p> <h3 id="eatvalue">eatValue</h3><p>This function is simpler than the full parser functions we wrote before.</p> <p>First off it looks for the simple case where the value is a scalar.</p> <div class="highlight"><pre><span></span><span class="k">func</span><span class="w"> </span><span class="p">(</span><span class="n">jr</span><span class="w"> </span><span class="o">*</span><span class="n">jsonReader</span><span class="p">)</span><span class="w"> </span><span class="n">eatValue</span><span class="p">(</span><span class="n">r</span><span class="w"> </span><span class="o">*</span><span class="n">bufio</span><span class="o">.</span><span class="n">Reader</span><span class="p">)</span><span class="w"> </span><span class="n">error</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">var</span><span class="w"> </span><span class="n">stack</span><span class="w"> </span><span class="p">[]</span><span class="n">byte</span> <span class="w"> </span><span class="n">inString</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="bp">false</span> <span class="w"> </span><span class="k">var</span><span class="w"> </span><span class="n">prev</span><span class="w"> </span><span class="n">byte</span> <span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">jr</span><span class="o">.</span><span class="n">eatWhitespace</span><span class="p">(</span><span class="n">r</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">ok</span><span class="p">,</span><span class="w"> </span><span class="n">_</span><span class="p">,</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">jr</span><span class="o">.</span><span class="n">tryScalar</span><span class="p">(</span><span class="n">r</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">It</span><span class="w"> </span><span class="n">was</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="n">scalar</span><span class="p">,</span><span class="w"> </span><span class="n">we</span><span class="s1">&#39;re done!</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">nil</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>All it does is read until the value ends.</p> <p>If the value is not a scalar though we need to read past complete JSON arrays and/or objects.</p> <p>To do this we'll simply read through bytes, monitoring a stack of open and close braces and brackets. If we enter a string we'll skip all bytes inside the string until the string ends.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="c1">// Otherwise it&#39;s an array or object</span> <span class="w"> </span><span class="nx">first</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kc">true</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">first</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">stack</span><span class="p">)</span><span class="w"> </span><span class="p">&gt;</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">first</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="nx">bs</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">Peek</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">bs</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">inString</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;&quot;&#39;</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">prev</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="sc">&#39;\\&#39;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">inString</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Two \\-es cancel eachother out</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;\\&#39;</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">prev</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;\\&#39;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">prev</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">byte</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">prev</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">b</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">Discard</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="w"> </span><span class="k">continue</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="sc">&#39;[&#39;</span><span class="p">:</span> <span class="w"> </span><span class="nx">stack</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">stack</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="p">)</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="sc">&#39;]&#39;</span><span class="p">:</span> <span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">stack</span><span class="p">[</span><span class="nb">len</span><span class="p">(</span><span class="nx">stack</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="w"> </span><span class="nx">stack</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">stack</span><span class="p">[:</span><span class="nb">len</span><span class="p">(</span><span class="nx">stack</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="sc">&#39;[&#39;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Unexpected end of array: &#39;%s&#39;&quot;</span><span class="p">,</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">c</span><span class="p">))</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="sc">&#39;{&#39;</span><span class="p">:</span> <span class="w"> </span><span class="nx">stack</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">stack</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="p">)</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="sc">&#39;}&#39;</span><span class="p">:</span> <span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">stack</span><span class="p">[</span><span class="nb">len</span><span class="p">(</span><span class="nx">stack</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="w"> </span><span class="nx">stack</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">stack</span><span class="p">[:</span><span class="nb">len</span><span class="p">(</span><span class="nx">stack</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="sc">&#39;{&#39;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Unexpected end of object: &#39;%s&#39;&quot;</span><span class="p">,</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">c</span><span class="p">))</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="sc">&#39;&quot;&#39;</span><span class="p">:</span> <span class="w"> </span><span class="nx">inString</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">true</span> <span class="w"> </span><span class="c1">// Closing quote case handled elsewhere, above</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">Discard</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="w"> </span><span class="nx">prev</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">b</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <p>And we're finally done the first pass of the path-aware jq implementation.</p> <h3 id="build,-test,-benchmark">Build, test, benchmark</h3><p>Let's give it a go module, build and test it.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>mod<span class="w"> </span>init<span class="w"> </span>jqgo $<span class="w"> </span>go<span class="w"> </span>build $<span class="w"> </span>curl<span class="w"> </span>https://raw.githubusercontent.com/json-iterator/test-data/master/large-file.json<span class="w"> </span><span class="p">|</span><span class="w"> </span>jq<span class="w"> </span>-c<span class="w"> </span><span class="s1">&#39;.[]&#39;</span><span class="w"> </span>&gt;<span class="w"> </span>large-file.json $<span class="w"> </span>cat<span class="w"> </span>large-file.json<span class="w"> </span><span class="p">|</span><span class="w"> </span>./jqgo<span class="w"> </span><span class="s1">&#39;.repo.url&#39;</span><span class="w"> </span>&gt;<span class="w"> </span>jqgo.test $<span class="w"> </span>cat<span class="w"> </span>large-file.json<span class="w"> </span><span class="p">|</span><span class="w"> </span>jq<span class="w"> </span><span class="s1">&#39;.repo.url&#39;</span><span class="w"> </span>&gt;<span class="w"> </span>jq.test $<span class="w"> </span>diff<span class="w"> </span>jq.test<span class="w"> </span>jqgo.test $<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="nv">$?</span> <span class="m">0</span> </pre></div> <p>Great! :) Let's benchmark it against jq and the control implementation.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>hyperfine<span class="w"> </span>--warmup<span class="w"> </span><span class="m">2</span><span class="w"> </span><span class="se">\</span> <span class="w"> </span><span class="s2">&quot;cat large-file.json | ./control/control &#39;.repo.url&#39; &gt; control.test&quot;</span><span class="w"> </span><span class="se">\</span> <span class="w"> </span><span class="s2">&quot;cat large-file.json | ./jqgo &#39;.repo.url&#39; &gt; jqgo.test&quot;</span><span class="w"> </span><span class="se">\</span> <span class="w"> </span><span class="s2">&quot;cat large-file.json | jq &#39;.repo.url&#39; &gt; jq.test&quot;</span> Benchmark<span class="w"> </span><span class="m">1</span>:<span class="w"> </span>cat<span class="w"> </span>large-file.json<span class="w"> </span><span class="p">|</span><span class="w"> </span>./control/control<span class="w"> </span><span class="s1">&#39;.repo.url&#39;</span><span class="w"> </span>&gt;<span class="w"> </span>control.test <span class="w"> </span>Time<span class="w"> </span><span class="o">(</span>mean<span class="w"> </span>±<span class="w"> </span>σ<span class="o">)</span>:<span class="w"> </span><span class="m">302</span>.0<span class="w"> </span>ms<span class="w"> </span>±<span class="w"> </span><span class="m">3</span>.4<span class="w"> </span>ms<span class="w"> </span><span class="o">[</span>User:<span class="w"> </span><span class="m">283</span>.7<span class="w"> </span>ms,<span class="w"> </span>System:<span class="w"> </span><span class="m">53</span>.1<span class="w"> </span>ms<span class="o">]</span> <span class="w"> </span>Range<span class="w"> </span><span class="o">(</span>min<span class="w"> </span>…<span class="w"> </span>max<span class="o">)</span>:<span class="w"> </span><span class="m">297</span>.4<span class="w"> </span>ms<span class="w"> </span>…<span class="w"> </span><span class="m">309</span>.0<span class="w"> </span>ms<span class="w"> </span><span class="m">10</span><span class="w"> </span>runs Benchmark<span class="w"> </span><span class="m">2</span>:<span class="w"> </span>cat<span class="w"> </span>large-file.json<span class="w"> </span><span class="p">|</span><span class="w"> </span>./jqgo<span class="w"> </span><span class="s1">&#39;.repo.url&#39;</span><span class="w"> </span>&gt;<span class="w"> </span>jqgo.test <span class="w"> </span>Time<span class="w"> </span><span class="o">(</span>mean<span class="w"> </span>±<span class="w"> </span>σ<span class="o">)</span>:<span class="w"> </span><span class="m">258</span>.8<span class="w"> </span>ms<span class="w"> </span>±<span class="w"> </span><span class="m">2</span>.2<span class="w"> </span>ms<span class="w"> </span><span class="o">[</span>User:<span class="w"> </span><span class="m">230</span>.3<span class="w"> </span>ms,<span class="w"> </span>System:<span class="w"> </span><span class="m">47</span>.6<span class="w"> </span>ms<span class="o">]</span> <span class="w"> </span>Range<span class="w"> </span><span class="o">(</span>min<span class="w"> </span>…<span class="w"> </span>max<span class="o">)</span>:<span class="w"> </span><span class="m">256</span>.3<span class="w"> </span>ms<span class="w"> </span>…<span class="w"> </span><span class="m">262</span>.6<span class="w"> </span>ms<span class="w"> </span><span class="m">11</span><span class="w"> </span>runs Benchmark<span class="w"> </span><span class="m">3</span>:<span class="w"> </span>cat<span class="w"> </span>large-file.json<span class="w"> </span><span class="p">|</span><span class="w"> </span>jq<span class="w"> </span><span class="s1">&#39;.repo.url&#39;</span><span class="w"> </span>&gt;<span class="w"> </span>jq.test <span class="w"> </span>Time<span class="w"> </span><span class="o">(</span>mean<span class="w"> </span>±<span class="w"> </span>σ<span class="o">)</span>:<span class="w"> </span><span class="m">357</span>.6<span class="w"> </span>ms<span class="w"> </span>±<span class="w"> </span><span class="m">2</span>.9<span class="w"> </span>ms<span class="w"> </span><span class="o">[</span>User:<span class="w"> </span><span class="m">350</span>.0<span class="w"> </span>ms,<span class="w"> </span>System:<span class="w"> </span><span class="m">28</span>.3<span class="w"> </span>ms<span class="o">]</span> <span class="w"> </span>Range<span class="w"> </span><span class="o">(</span>min<span class="w"> </span>…<span class="w"> </span>max<span class="o">)</span>:<span class="w"> </span><span class="m">355</span>.0<span class="w"> </span>ms<span class="w"> </span>…<span class="w"> </span><span class="m">362</span>.9<span class="w"> </span>ms<span class="w"> </span><span class="m">10</span><span class="w"> </span>runs Summary <span class="w"> </span><span class="s1">&#39;cat large-file.json | ./jqgo &#39;</span>.repo.url<span class="s1">&#39; &gt; jqgo.test&#39;</span><span class="w"> </span>ran <span class="w"> </span><span class="m">1</span>.17<span class="w"> </span>±<span class="w"> </span><span class="m">0</span>.02<span class="w"> </span><span class="nb">times</span><span class="w"> </span>faster<span class="w"> </span>than<span class="w"> </span><span class="s1">&#39;cat large-file.json | ./control/control &#39;</span>.repo.url<span class="s1">&#39; &gt; control.test&#39;</span> <span class="w"> </span><span class="m">1</span>.38<span class="w"> </span>±<span class="w"> </span><span class="m">0</span>.02<span class="w"> </span><span class="nb">times</span><span class="w"> </span>faster<span class="w"> </span>than<span class="w"> </span><span class="s1">&#39;cat large-file.json | jq &#39;</span>.repo.url<span class="s1">&#39; &gt; jq.test&#39;</span> </pre></div> <p>Now to my surprise we're already beating the non-path-aware control implementation! When I first wrote the path-aware version, it was slower than the control. So I had to start performance profiling. For this blog post I tried to remake the slowest variation I could remember but I couldn't get it slower than this.</p> <p>That said, the best version <em>was</em> faster than this so I <em>can</em> demonstrate the process of profiling to improve performance.</p> <p>Let's dig in. :)</p> <h3 id="profiling-in-go">Profiling in Go</h3><p>There are various ways to enable profiling in Go. One way some people recommend is through the <a href="https://dave.cheney.net/2013/06/30/how-to-write-benchmarks-in-go">builtin benchmark support</a> in <code>go test</code>. I don't really like this method though. I prefer to use <a href="https://github.com/pkg/profile">pkg/profile</a> manually in <code>main.go</code>.</p> <div class="highlight"><pre><span></span><span class="gu">@@ -9,6 +9,8 @@</span> <span class="w"> </span> &quot;os&quot; <span class="w"> </span> &quot;strconv&quot; <span class="w"> </span> &quot;strings&quot; <span class="gi">+</span> <span class="gi">+ &quot;github.com/pkg/profile&quot;</span> <span class="w"> </span>) <span class="w"> </span>type jsonReader struct { <span class="gu">@@ -450,6 +452,7 @@</span> <span class="w"> </span>} <span class="w"> </span>func main() { <span class="gi">+ defer profile.Start().Stop()</span> <span class="w"> </span> path := strings.Split(os.Args[1], &quot;.&quot;) <span class="w"> </span> if path[0] == &quot;&quot; { <span class="w"> </span> path = path[1:] </pre></div> <p>Build and run:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>mod<span class="w"> </span>tidy $<span class="w"> </span>go<span class="w"> </span>build $<span class="w"> </span>cat<span class="w"> </span>large-file.json<span class="w"> </span><span class="p">|</span><span class="w"> </span>./jqgo<span class="w"> </span><span class="s1">&#39;.repo.url&#39;</span><span class="w"> </span>&gt;<span class="w"> </span>/dev/null <span class="m">2022</span>/07/11<span class="w"> </span><span class="m">02</span>:38:57<span class="w"> </span>profile:<span class="w"> </span>cpu<span class="w"> </span>profiling<span class="w"> </span>enabled,<span class="w"> </span>/tmp/profile3691177944/cpu.pprof <span class="m">2022</span>/07/11<span class="w"> </span><span class="m">02</span>:38:58<span class="w"> </span>profile:<span class="w"> </span>cpu<span class="w"> </span>profiling<span class="w"> </span>disabled,<span class="w"> </span>/tmp/profile3691177944/cpu.pprof </pre></div> <p>Go can <a href="https://www.honeycomb.io/blog/golang-observability-using-the-new-pprof-web-ui-to-debug-memory-usage/">run a web server</a> to visualize the pprof results but I find (after literally a few years of trying to figure it out) the CLI makes more sense to me.</p> <div class="highlight"><pre><span></span><span class="err">$</span><span class="w"> </span><span class="k">go</span><span class="w"> </span><span class="nx">tool</span><span class="w"> </span><span class="nx">pprof</span><span class="w"> </span><span class="o">/</span><span class="nx">tmp</span><span class="o">/</span><span class="nx">profile3691177944</span><span class="o">/</span><span class="nx">cpu</span><span class="p">.</span><span class="nx">pprof</span> <span class="nx">File</span><span class="p">:</span><span class="w"> </span><span class="nx">jqgo</span> <span class="nx">Type</span><span class="p">:</span><span class="w"> </span><span class="nx">cpu</span> <span class="nx">Time</span><span class="p">:</span><span class="w"> </span><span class="nx">Jul</span><span class="w"> </span><span class="mi">11</span><span class="p">,</span><span class="w"> </span><span class="mi">2022</span><span class="w"> </span><span class="nx">at</span><span class="w"> </span><span class="mi">2</span><span class="p">:</span><span class="mi">38</span><span class="nx">am</span><span class="w"> </span><span class="p">(</span><span class="nx">UTC</span><span class="p">)</span> <span class="nx">Duration</span><span class="p">:</span><span class="w"> </span><span class="mf">401.63</span><span class="nx">ms</span><span class="p">,</span><span class="w"> </span><span class="nx">Total</span><span class="w"> </span><span class="nx">samples</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">270</span><span class="nx">ms</span><span class="w"> </span><span class="p">(</span><span class="mf">67.23</span><span class="o">%</span><span class="p">)</span> <span class="nx">Entering</span><span class="w"> </span><span class="nx">interactive</span><span class="w"> </span><span class="nx">mode</span><span class="w"> </span><span class="p">(</span><span class="kd">type</span><span class="w"> </span><span class="s">&quot;help&quot;</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">commands</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;o&quot;</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">options</span><span class="p">)</span> <span class="p">(</span><span class="nx">pprof</span><span class="p">)</span> </pre></div> <p>Now we run <code>top10</code> to see where we spend the bulk of time.</p> <div class="highlight"><pre><span></span><span class="p">(</span><span class="nx">pprof</span><span class="p">)</span><span class="w"> </span><span class="nx">top10</span> <span class="nx">Showing</span><span class="w"> </span><span class="nx">nodes</span><span class="w"> </span><span class="nx">accounting</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="mi">260</span><span class="nx">ms</span><span class="p">,</span><span class="w"> </span><span class="mi">100</span><span class="o">%</span><span class="w"> </span><span class="nx">of</span><span class="w"> </span><span class="mi">260</span><span class="nx">ms</span><span class="w"> </span><span class="nx">total</span> <span class="nx">Showing</span><span class="w"> </span><span class="nx">top</span><span class="w"> </span><span class="mi">10</span><span class="w"> </span><span class="nx">nodes</span><span class="w"> </span><span class="nx">out</span><span class="w"> </span><span class="nx">of</span><span class="w"> </span><span class="mi">31</span> <span class="w"> </span><span class="nx">flat</span><span class="w"> </span><span class="nx">flat</span><span class="o">%</span><span class="w"> </span><span class="nx">sum</span><span class="o">%</span><span class="w"> </span><span class="nx">cum</span><span class="w"> </span><span class="nx">cum</span><span class="o">%</span> <span class="w"> </span><span class="mi">90</span><span class="nx">ms</span><span class="w"> </span><span class="mf">34.62</span><span class="o">%</span><span class="w"> </span><span class="mf">34.62</span><span class="o">%</span><span class="w"> </span><span class="mi">230</span><span class="nx">ms</span><span class="w"> </span><span class="mf">88.46</span><span class="o">%</span><span class="w"> </span><span class="nx">main</span><span class="p">.(</span><span class="o">*</span><span class="nx">jsonReader</span><span class="p">).</span><span class="nx">eatValue</span> <span class="w"> </span><span class="mi">60</span><span class="nx">ms</span><span class="w"> </span><span class="mf">23.08</span><span class="o">%</span><span class="w"> </span><span class="mf">57.69</span><span class="o">%</span><span class="w"> </span><span class="mi">70</span><span class="nx">ms</span><span class="w"> </span><span class="mf">26.92</span><span class="o">%</span><span class="w"> </span><span class="nx">bufio</span><span class="p">.(</span><span class="o">*</span><span class="nx">Reader</span><span class="p">).</span><span class="nx">Peek</span> <span class="w"> </span><span class="mi">50</span><span class="nx">ms</span><span class="w"> </span><span class="mf">19.23</span><span class="o">%</span><span class="w"> </span><span class="mf">76.92</span><span class="o">%</span><span class="w"> </span><span class="mi">60</span><span class="nx">ms</span><span class="w"> </span><span class="mf">23.08</span><span class="o">%</span><span class="w"> </span><span class="nx">bufio</span><span class="p">.(</span><span class="o">*</span><span class="nx">Reader</span><span class="p">).</span><span class="nx">Discard</span> <span class="w"> </span><span class="mi">20</span><span class="nx">ms</span><span class="w"> </span><span class="mf">7.69</span><span class="o">%</span><span class="w"> </span><span class="mf">84.62</span><span class="o">%</span><span class="w"> </span><span class="mi">20</span><span class="nx">ms</span><span class="w"> </span><span class="mf">7.69</span><span class="o">%</span><span class="w"> </span><span class="nx">syscall</span><span class="p">.</span><span class="nx">Syscall</span> <span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mf">3.85</span><span class="o">%</span><span class="w"> </span><span class="mf">88.46</span><span class="o">%</span><span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mf">3.85</span><span class="o">%</span><span class="w"> </span><span class="nx">bufio</span><span class="p">.(</span><span class="o">*</span><span class="nx">Reader</span><span class="p">).</span><span class="nx">Buffered</span><span class="w"> </span><span class="p">(</span><span class="nx">inline</span><span class="p">)</span> <span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mf">3.85</span><span class="o">%</span><span class="w"> </span><span class="mf">92.31</span><span class="o">%</span><span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mf">3.85</span><span class="o">%</span><span class="w"> </span><span class="nx">main</span><span class="p">.(</span><span class="o">*</span><span class="nx">jsonReader</span><span class="p">).</span><span class="nx">readByte</span> <span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mf">3.85</span><span class="o">%</span><span class="w"> </span><span class="mf">96.15</span><span class="o">%</span><span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mf">3.85</span><span class="o">%</span><span class="w"> </span><span class="nx">runtime</span><span class="p">.</span><span class="nx">slicebytetostring</span> <span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mf">3.85</span><span class="o">%</span><span class="w"> </span><span class="mi">100</span><span class="o">%</span><span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mf">3.85</span><span class="o">%</span><span class="w"> </span><span class="nx">runtime</span><span class="p">.</span><span class="nx">stkbucket</span> <span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="mi">0</span><span class="o">%</span><span class="w"> </span><span class="mi">100</span><span class="o">%</span><span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mf">3.85</span><span class="o">%</span><span class="w"> </span><span class="nx">bufio</span><span class="p">.(</span><span class="o">*</span><span class="nx">Reader</span><span class="p">).</span><span class="nx">fill</span> <span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="mi">0</span><span class="o">%</span><span class="w"> </span><span class="mi">100</span><span class="o">%</span><span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mf">3.85</span><span class="o">%</span><span class="w"> </span><span class="nx">encoding</span><span class="o">/</span><span class="nx">json</span><span class="p">.(</span><span class="o">*</span><span class="nx">Encoder</span><span class="p">).</span><span class="nx">Encode</span> </pre></div> <p>Now this is weird. Why are <code>Peek</code> and <code>Discard</code> so expensive? And why are we spending so much time in <code>syscall.Syscall</code>? The entire point of buffered I/O is to avoid hitting syscalls too frequently.</p> <p>But since 88% of time is spent in <code>eatValue</code>, let's verify where in <code>eatValue</code> we are spending that time.</p> <p>Within the <code>pprof</code> REPL we can enter <code>list X</code> where <code>X</code> is a regexp of a function name.</p> <div class="highlight"><pre><span></span><span class="p">(</span><span class="nx">pprof</span><span class="p">)</span><span class="w"> </span><span class="nx">list</span><span class="w"> </span><span class="nx">eatValue</span> <span class="nx">Total</span><span class="p">:</span><span class="w"> </span><span class="mi">260</span><span class="nx">ms</span> <span class="nx">ROUTINE</span><span class="w"> </span><span class="o">========================</span><span class="w"> </span><span class="nx">main</span><span class="p">.(</span><span class="o">*</span><span class="nx">jsonReader</span><span class="p">).</span><span class="nx">eatValue</span><span class="w"> </span><span class="nx">in</span><span class="w"> </span><span class="o">/</span><span class="nx">home</span><span class="o">/</span><span class="nx">phil</span><span class="o">/</span><span class="nx">tmp</span><span class="o">/</span><span class="nx">jqgo</span><span class="o">/</span><span class="nx">mainprof</span><span class="p">.</span><span class="k">go</span> <span class="w"> </span><span class="mi">90</span><span class="nx">ms</span><span class="w"> </span><span class="mi">230</span><span class="nx">ms</span><span class="w"> </span><span class="p">(</span><span class="nx">flat</span><span class="p">,</span><span class="w"> </span><span class="nx">cum</span><span class="p">)</span><span class="w"> </span><span class="mf">88.46</span><span class="o">%</span><span class="w"> </span><span class="nx">of</span><span class="w"> </span><span class="nx">Total</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">159</span><span class="p">:</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">eatWhitespace</span><span class="p">(</span><span class="nx">r</span><span class="p">)</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">160</span><span class="p">:</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">161</span><span class="p">:</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">162</span><span class="p">:</span><span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">163</span><span class="p">:</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">20</span><span class="nx">ms</span><span class="w"> </span><span class="mi">164</span><span class="p">:</span><span class="w"> </span><span class="nx">ok</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">tryScalar</span><span class="p">(</span><span class="nx">r</span><span class="p">)</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">165</span><span class="p">:</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">166</span><span class="p">:</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">167</span><span class="p">:</span><span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">168</span><span class="p">:</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">169</span><span class="p">:</span><span class="w"> </span><span class="c1">// It was a scalar, we&#39;re done!</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">170</span><span class="p">:</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">171</span><span class="p">:</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">172</span><span class="p">:</span><span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">173</span><span class="p">:</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">174</span><span class="p">:</span><span class="w"> </span><span class="c1">// Otherwise it&#39;s an array or object</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">175</span><span class="p">:</span><span class="w"> </span><span class="nx">first</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kc">true</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">176</span><span class="p">:</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">177</span><span class="p">:</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">first</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">stack</span><span class="p">)</span><span class="w"> </span><span class="p">&gt;</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">178</span><span class="p">:</span><span class="w"> </span><span class="nx">first</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">179</span><span class="p">:</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">60</span><span class="nx">ms</span><span class="w"> </span><span class="mi">180</span><span class="p">:</span><span class="w"> </span><span class="nx">bs</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">Peek</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">181</span><span class="p">:</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">182</span><span class="p">:</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">183</span><span class="p">:</span><span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mi">184</span><span class="p">:</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">bs</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">185</span><span class="p">:</span> <span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mi">186</span><span class="p">:</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">inString</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">187</span><span class="p">:</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;&quot;&#39;</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">prev</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="sc">&#39;\\&#39;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">188</span><span class="p">:</span><span class="w"> </span><span class="nx">inString</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">189</span><span class="p">:</span><span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">190</span><span class="p">:</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">191</span><span class="p">:</span><span class="w"> </span><span class="c1">// Two \\-es cancel eachother out</span> <span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mi">192</span><span class="p">:</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;\\&#39;</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">prev</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;\\&#39;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">193</span><span class="p">:</span><span class="w"> </span><span class="nx">prev</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">byte</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">194</span><span class="p">:</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">195</span><span class="p">:</span><span class="w"> </span><span class="nx">prev</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">b</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">196</span><span class="p">:</span><span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">197</span><span class="p">:</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">60</span><span class="nx">ms</span><span class="w"> </span><span class="mi">198</span><span class="p">:</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">Discard</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">199</span><span class="p">:</span><span class="w"> </span><span class="k">continue</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">200</span><span class="p">:</span><span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">201</span><span class="p">:</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">202</span><span class="p">:</span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">203</span><span class="p">:</span><span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="sc">&#39;[&#39;</span><span class="p">:</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">204</span><span class="p">:</span><span class="w"> </span><span class="nx">stack</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">stack</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="p">)</span> <span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mi">205</span><span class="p">:</span><span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="sc">&#39;]&#39;</span><span class="p">:</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">206</span><span class="p">:</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">stack</span><span class="p">[</span><span class="nb">len</span><span class="p">(</span><span class="nx">stack</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">207</span><span class="p">:</span><span class="w"> </span><span class="nx">stack</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">stack</span><span class="p">[:</span><span class="nb">len</span><span class="p">(</span><span class="nx">stack</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">208</span><span class="p">:</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="sc">&#39;[&#39;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">209</span><span class="p">:</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Unexpected end of array: &#39;%s&#39;&quot;</span><span class="p">,</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">c</span><span class="p">))</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">210</span><span class="p">:</span><span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">211</span><span class="p">:</span><span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="sc">&#39;{&#39;</span><span class="p">:</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">212</span><span class="p">:</span><span class="w"> </span><span class="nx">stack</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">stack</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="p">)</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">213</span><span class="p">:</span><span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="sc">&#39;}&#39;</span><span class="p">:</span> <span class="w"> </span><span class="mi">50</span><span class="nx">ms</span><span class="w"> </span><span class="mi">50</span><span class="nx">ms</span><span class="w"> </span><span class="mi">214</span><span class="p">:</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">stack</span><span class="p">[</span><span class="nb">len</span><span class="p">(</span><span class="nx">stack</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">215</span><span class="p">:</span><span class="w"> </span><span class="nx">stack</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">stack</span><span class="p">[:</span><span class="nb">len</span><span class="p">(</span><span class="nx">stack</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">216</span><span class="p">:</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="sc">&#39;{&#39;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">217</span><span class="p">:</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Unexpected end of object: &#39;%s&#39;&quot;</span><span class="p">,</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">c</span><span class="p">))</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">218</span><span class="p">:</span><span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">219</span><span class="p">:</span><span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="sc">&#39;&quot;&#39;</span><span class="p">:</span> </pre></div> <p>So by rank we can see we do spend the most time in <code>Peek</code> and <code>Discard</code>. Then in pulling the last item out of the stack??? That's weird. Let's ignore that.</p> <h3 id="peek-and-discard">Peek and Discard</h3><p>Let's look at <code>Peek</code> in the pprof REPL:</p> <div class="highlight"><pre><span></span><span class="p">(</span><span class="nx">pprof</span><span class="p">)</span><span class="w"> </span><span class="nx">list</span><span class="w"> </span><span class="nx">Peek</span> <span class="nx">Total</span><span class="p">:</span><span class="w"> </span><span class="mi">260</span><span class="nx">ms</span> <span class="nx">ROUTINE</span><span class="w"> </span><span class="o">========================</span><span class="w"> </span><span class="nx">bufio</span><span class="p">.(</span><span class="o">*</span><span class="nx">Reader</span><span class="p">).</span><span class="nx">Peek</span><span class="w"> </span><span class="nx">in</span><span class="w"> </span><span class="o">/</span><span class="nx">usr</span><span class="o">/</span><span class="nx">local</span><span class="o">/</span><span class="k">go</span><span class="o">/</span><span class="nx">src</span><span class="o">/</span><span class="nx">bufio</span><span class="o">/</span><span class="nx">bufio</span><span class="p">.</span><span class="k">go</span> <span class="w"> </span><span class="mi">60</span><span class="nx">ms</span><span class="w"> </span><span class="mi">70</span><span class="nx">ms</span><span class="w"> </span><span class="p">(</span><span class="nx">flat</span><span class="p">,</span><span class="w"> </span><span class="nx">cum</span><span class="p">)</span><span class="w"> </span><span class="mf">26.92</span><span class="o">%</span><span class="w"> </span><span class="nx">of</span><span class="w"> </span><span class="nx">Total</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">130</span><span class="p">:</span><span class="c1">// also returns an error explaining why the read is short. The error is</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">131</span><span class="p">:</span><span class="c1">// ErrBufferFull if n is larger than b&#39;s buffer size.</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">132</span><span class="p">:</span><span class="c1">//</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">133</span><span class="p">:</span><span class="c1">// Calling Peek prevents a UnreadByte or UnreadRune call from succeeding</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">134</span><span class="p">:</span><span class="c1">// until the next read operation.</span> <span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mi">135</span><span class="p">:</span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">b</span><span class="w"> </span><span class="o">*</span><span class="nx">Reader</span><span class="p">)</span><span class="w"> </span><span class="nx">Peek</span><span class="p">(</span><span class="nx">n</span><span class="w"> </span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="p">([]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">136</span><span class="p">:</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">n</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">137</span><span class="p">:</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">ErrNegativeCount</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">138</span><span class="p">:</span><span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">139</span><span class="p">:</span> <span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mi">140</span><span class="p">:</span><span class="w"> </span><span class="nx">b</span><span class="p">.</span><span class="nx">lastByte</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="o">-</span><span class="mi">1</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">141</span><span class="p">:</span><span class="w"> </span><span class="nx">b</span><span class="p">.</span><span class="nx">lastRuneSize</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="o">-</span><span class="mi">1</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">142</span><span class="p">:</span> <span class="w"> </span><span class="mi">20</span><span class="nx">ms</span><span class="w"> </span><span class="mi">20</span><span class="nx">ms</span><span class="w"> </span><span class="mi">143</span><span class="p">:</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">b</span><span class="p">.</span><span class="nx">w</span><span class="o">-</span><span class="nx">b</span><span class="p">.</span><span class="nx">r</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nx">n</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">b</span><span class="p">.</span><span class="nx">w</span><span class="o">-</span><span class="nx">b</span><span class="p">.</span><span class="nx">r</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">b</span><span class="p">.</span><span class="nx">buf</span><span class="p">)</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">b</span><span class="p">.</span><span class="nx">err</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mi">144</span><span class="p">:</span><span class="w"> </span><span class="nx">b</span><span class="p">.</span><span class="nx">fill</span><span class="p">()</span><span class="w"> </span><span class="c1">// b.w-b.r &lt; len(b.buf) =&gt; buffer is not full</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">145</span><span class="p">:</span><span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">146</span><span class="p">:</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">147</span><span class="p">:</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">n</span><span class="w"> </span><span class="p">&gt;</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">b</span><span class="p">.</span><span class="nx">buf</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">148</span><span class="p">:</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">b</span><span class="p">.</span><span class="nx">buf</span><span class="p">[</span><span class="nx">b</span><span class="p">.</span><span class="nx">r</span><span class="p">:</span><span class="nx">b</span><span class="p">.</span><span class="nx">w</span><span class="p">],</span><span class="w"> </span><span class="nx">ErrBufferFull</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">149</span><span class="p">:</span><span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">150</span><span class="p">:</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">151</span><span class="p">:</span><span class="w"> </span><span class="c1">// 0 &lt;= n &lt;= len(b.buf)</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">152</span><span class="p">:</span><span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="kt">error</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">153</span><span class="p">:</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">avail</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">b</span><span class="p">.</span><span class="nx">w</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nx">b</span><span class="p">.</span><span class="nx">r</span><span class="p">;</span><span class="w"> </span><span class="nx">avail</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nx">n</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">154</span><span class="p">:</span><span class="w"> </span><span class="c1">// not enough data in buffer</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">155</span><span class="p">:</span><span class="w"> </span><span class="nx">n</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">avail</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">156</span><span class="p">:</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">b</span><span class="p">.</span><span class="nx">readErr</span><span class="p">()</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">157</span><span class="p">:</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mi">158</span><span class="p">:</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">ErrBufferFull</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">159</span><span class="p">:</span><span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">160</span><span class="p">:</span><span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mi">10</span><span class="nx">ms</span><span class="w"> </span><span class="mi">161</span><span class="p">:</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">b</span><span class="p">.</span><span class="nx">buf</span><span class="p">[</span><span class="nx">b</span><span class="p">.</span><span class="nx">r</span><span class="w"> </span><span class="p">:</span><span class="w"> </span><span class="nx">b</span><span class="p">.</span><span class="nx">r</span><span class="o">+</span><span class="nx">n</span><span class="p">],</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">162</span><span class="p">:}</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">163</span><span class="p">:</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">164</span><span class="p">:</span><span class="c1">// Discard skips the next n bytes, returning the number of bytes discarded.</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">165</span><span class="p">:</span><span class="c1">//</span> <span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="mi">166</span><span class="p">:</span><span class="c1">// If Discard skips fewer than n bytes, it also returns an error.</span> </pre></div> <p>The bulk of time here is spent in refilling the buffer (the <code>fill</code> method). So it seems like while <code>bufio.Reader</code> buffers <em>reads</em> it basically seems to not buffer <em>peeks</em>.</p> <p>But hey, we were peeking and discarding one at a time anyway. Peeking and discarding were the same cost in <code>eatValue</code>. So let's ignore peeking for a second and think about discarding.</p> <p>We could avoid doing so many discards if we just keep track of how much we are peeking at in the loop and only discard once at the end of the loop. (As an implementation detail, since there's a max internal buffer size we'll need to actually periodically discard when we try to peek and get a "buffer full" error.)</p> <p>And based on that <code>top10</code> result above, we need to do this in <code>eatValue</code>.</p> <div class="highlight"><pre><span></span><span class="gu">@@ -170,16 +170,31 @@</span> <span class="w"> </span> } <span class="w"> </span> // Otherwise it&#39;s an array or object <span class="gi">+ length := 0</span> <span class="w"> </span> first := true <span class="gd">-</span> <span class="gi">+ var bs []byte</span> <span class="w"> </span> for first || len(stack) &gt; 0 { <span class="gi">+ length++</span> <span class="w"> </span> first = false <span class="gd">- bs, err := r.Peek(1)</span> <span class="gd">- if err != nil {</span> <span class="gd">- return err</span> <span class="gi">+ for {</span> <span class="gi">+ bs, err = r.Peek(length)</span> <span class="gi">+ if err == bufio.ErrBufferFull {</span> <span class="gi">+ _, err = r.Discard(length - 1)</span> <span class="gi">+ if err != nil {</span> <span class="gi">+ return err</span> <span class="gi">+ }</span> <span class="gi">+</span> <span class="gi">+ length = 1</span> <span class="gi">+ continue</span> <span class="gi">+ }</span> <span class="gi">+ if err != nil {</span> <span class="gi">+ return err</span> <span class="gi">+ }</span> <span class="gi">+</span> <span class="gi">+ break</span> <span class="w"> </span> } <span class="gd">- b := bs[0]</span> <span class="gi">+ b := bs[length-1]</span> <span class="w"> </span> if inString { <span class="w"> </span> if b == &#39;&quot;&#39; &amp;&amp; prev != &#39;\\&#39; { <span class="gu">@@ -193,7 +208,6 @@</span> <span class="w"> </span> prev = b <span class="w"> </span> } <span class="gd">- r.Discard(1)</span> <span class="w"> </span> continue <span class="w"> </span> } <span class="gu">@@ -219,11 +233,11 @@</span> <span class="w"> </span> // Closing quote case handled elsewhere, above <span class="w"> </span> } <span class="gd">- r.Discard(1)</span> <span class="w"> </span> prev = b <span class="w"> </span> } <span class="gd">- return nil</span> <span class="gi">+ _, err = r.Discard(length)</span> <span class="gi">+ return err</span> <span class="w"> </span>} <span class="w"> </span>func (jr *jsonReader) tryScalar(r *bufio.Reader) (bool, any, error) { </pre></div> <p>Comment out the <code>pkg/profile</code> bits (profiling slows the whole thing down), rebuild, and rerun:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>hyperfine<span class="w"> </span>--warmup<span class="w"> </span><span class="m">2</span><span class="w"> </span><span class="se">\</span> <span class="w"> </span><span class="s2">&quot;cat large-file.json | ./control/control &#39;.repo.url&#39; &gt; control.test&quot;</span><span class="w"> </span><span class="se">\</span> <span class="w"> </span><span class="s2">&quot;cat large-file.json | ./jqgo &#39;.repo.url&#39; &gt; jqgo.test&quot;</span><span class="w"> </span><span class="se">\</span> <span class="w"> </span><span class="s2">&quot;cat large-file.json | jq &#39;.repo.url&#39; &gt; jq.test&quot;</span> Benchmark<span class="w"> </span><span class="m">1</span>:<span class="w"> </span>cat<span class="w"> </span>large-file.json<span class="w"> </span><span class="p">|</span><span class="w"> </span>./control/control<span class="w"> </span><span class="s1">&#39;.repo.url&#39;</span><span class="w"> </span>&gt;<span class="w"> </span>control.test <span class="w"> </span>Time<span class="w"> </span><span class="o">(</span>mean<span class="w"> </span>±<span class="w"> </span>σ<span class="o">)</span>:<span class="w"> </span><span class="m">302</span>.0<span class="w"> </span>ms<span class="w"> </span>±<span class="w"> </span><span class="m">4</span>.2<span class="w"> </span>ms<span class="w"> </span><span class="o">[</span>User:<span class="w"> </span><span class="m">287</span>.7<span class="w"> </span>ms,<span class="w"> </span>System:<span class="w"> </span><span class="m">49</span>.7<span class="w"> </span>ms<span class="o">]</span><span class="w"> </span>Range<span class="w"> </span><span class="o">(</span>min<span class="w"> </span>…<span class="w"> </span>max<span class="o">)</span>:<span class="w"> </span><span class="m">296</span>.6<span class="w"> </span>ms<span class="w"> </span>…<span class="w"> </span><span class="m">308</span>.2<span class="w"> </span>ms<span class="w"> </span><span class="m">10</span><span class="w"> </span>runs Benchmark<span class="w"> </span><span class="m">2</span>:<span class="w"> </span>cat<span class="w"> </span>large-file.json<span class="w"> </span><span class="p">|</span><span class="w"> </span>./jqgo<span class="w"> </span><span class="s1">&#39;.repo.url&#39;</span><span class="w"> </span>&gt;<span class="w"> </span>jqgo.test <span class="w"> </span>Time<span class="w"> </span><span class="o">(</span>mean<span class="w"> </span>±<span class="w"> </span>σ<span class="o">)</span>:<span class="w"> </span><span class="m">215</span>.0<span class="w"> </span>ms<span class="w"> </span>±<span class="w"> </span><span class="m">1</span>.6<span class="w"> </span>ms<span class="w"> </span><span class="o">[</span>User:<span class="w"> </span><span class="m">189</span>.1<span class="w"> </span>ms,<span class="w"> </span>System:<span class="w"> </span><span class="m">46</span>.9<span class="w"> </span>ms<span class="o">]</span> <span class="w"> </span>Range<span class="w"> </span><span class="o">(</span>min<span class="w"> </span>…<span class="w"> </span>max<span class="o">)</span>:<span class="w"> </span><span class="m">213</span>.5<span class="w"> </span>ms<span class="w"> </span>…<span class="w"> </span><span class="m">218</span>.7<span class="w"> </span>ms<span class="w"> </span><span class="m">13</span><span class="w"> </span>runs Benchmark<span class="w"> </span><span class="m">3</span>:<span class="w"> </span>cat<span class="w"> </span>large-file.json<span class="w"> </span><span class="p">|</span><span class="w"> </span>jq<span class="w"> </span><span class="s1">&#39;.repo.url&#39;</span><span class="w"> </span>&gt;<span class="w"> </span>jq.test <span class="w"> </span>Time<span class="w"> </span><span class="o">(</span>mean<span class="w"> </span>±<span class="w"> </span>σ<span class="o">)</span>:<span class="w"> </span><span class="m">355</span>.7<span class="w"> </span>ms<span class="w"> </span>±<span class="w"> </span><span class="m">1</span>.4<span class="w"> </span>ms<span class="w"> </span><span class="o">[</span>User:<span class="w"> </span><span class="m">349</span>.9<span class="w"> </span>ms,<span class="w"> </span>System:<span class="w"> </span><span class="m">26</span>.4<span class="w"> </span>ms<span class="o">]</span> <span class="w"> </span>Range<span class="w"> </span><span class="o">(</span>min<span class="w"> </span>…<span class="w"> </span>max<span class="o">)</span>:<span class="w"> </span><span class="m">354</span>.3<span class="w"> </span>ms<span class="w"> </span>…<span class="w"> </span><span class="m">359</span>.1<span class="w"> </span>ms<span class="w"> </span><span class="m">10</span><span class="w"> </span>runs Summary <span class="w"> </span><span class="s1">&#39;cat large-file.json | ./jqgo &#39;</span>.repo.url<span class="s1">&#39; &gt; jqgo.test&#39;</span><span class="w"> </span>ran <span class="w"> </span><span class="m">1</span>.40<span class="w"> </span>±<span class="w"> </span><span class="m">0</span>.02<span class="w"> </span><span class="nb">times</span><span class="w"> </span>faster<span class="w"> </span>than<span class="w"> </span><span class="s1">&#39;cat large-file.json | ./control/control &#39;</span>.repo.url<span class="s1">&#39; &gt; control.test&#39;</span> <span class="w"> </span><span class="m">1</span>.65<span class="w"> </span>±<span class="w"> </span><span class="m">0</span>.01<span class="w"> </span><span class="nb">times</span><span class="w"> </span>faster<span class="w"> </span>than<span class="w"> </span><span class="s1">&#39;cat large-file.json | jq &#39;</span>.repo.url<span class="s1">&#39; &gt; jq.test&#39;</span> </pre></div> <p>Great! We've shaved off another 40ms. Let's enable profiling, re-run the program and go back into the pprof REPL.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>large-file.json<span class="w"> </span><span class="p">|</span><span class="w"> </span>./jqgo<span class="w"> </span><span class="s1">&#39;.repo.url&#39;</span><span class="w"> </span>&gt;<span class="w"> </span>/dev/null <span class="m">2022</span>/07/11<span class="w"> </span><span class="m">03</span>:12:07<span class="w"> </span>profile:<span class="w"> </span>cpu<span class="w"> </span>profiling<span class="w"> </span>enabled,<span class="w"> </span>/tmp/profile2229743747/cpu.pprof <span class="m">2022</span>/07/11<span class="w"> </span><span class="m">03</span>:12:07<span class="w"> </span>profile:<span class="w"> </span>cpu<span class="w"> </span>profiling<span class="w"> </span>disabled,<span class="w"> </span>/tmp/profile2229743747/cpu.pprof $<span class="w"> </span>go<span class="w"> </span>tool<span class="w"> </span>pprof<span class="w"> </span>/tmp/profile2229743747/cpu.pprof File:<span class="w"> </span>jqgo Type:<span class="w"> </span>cpu Time:<span class="w"> </span>Jul<span class="w"> </span><span class="m">11</span>,<span class="w"> </span><span class="m">2022</span><span class="w"> </span>at<span class="w"> </span><span class="m">3</span>:12am<span class="w"> </span><span class="o">(</span>UTC<span class="o">)</span> Duration:<span class="w"> </span><span class="m">401</span>.33ms,<span class="w"> </span>Total<span class="w"> </span><span class="nv">samples</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>210ms<span class="w"> </span><span class="o">(</span><span class="m">52</span>.33%<span class="o">)</span> Entering<span class="w"> </span>interactive<span class="w"> </span>mode<span class="w"> </span><span class="o">(</span><span class="nb">type</span><span class="w"> </span><span class="s2">&quot;help&quot;</span><span class="w"> </span><span class="k">for</span><span class="w"> </span>commands,<span class="w"> </span><span class="s2">&quot;o&quot;</span><span class="w"> </span><span class="k">for</span><span class="w"> </span>options<span class="o">)</span> <span class="o">(</span>pprof<span class="o">)</span><span class="w"> </span>top10 Showing<span class="w"> </span>nodes<span class="w"> </span>accounting<span class="w"> </span><span class="k">for</span><span class="w"> </span>210ms,<span class="w"> </span><span class="m">100</span>%<span class="w"> </span>of<span class="w"> </span>210ms<span class="w"> </span>total Showing<span class="w"> </span>top<span class="w"> </span><span class="m">10</span><span class="w"> </span>nodes<span class="w"> </span>out<span class="w"> </span>of<span class="w"> </span><span class="m">20</span> <span class="w"> </span>flat<span class="w"> </span>flat%<span class="w"> </span>sum%<span class="w"> </span>cum<span class="w"> </span>cum% <span class="w"> </span>100ms<span class="w"> </span><span class="m">47</span>.62%<span class="w"> </span><span class="m">47</span>.62%<span class="w"> </span>180ms<span class="w"> </span><span class="m">85</span>.71%<span class="w"> </span>main.<span class="o">(</span>*jsonReader<span class="o">)</span>.eatValue <span class="w"> </span>70ms<span class="w"> </span><span class="m">33</span>.33%<span class="w"> </span><span class="m">80</span>.95%<span class="w"> </span>70ms<span class="w"> </span><span class="m">33</span>.33%<span class="w"> </span>bufio.<span class="o">(</span>*Reader<span class="o">)</span>.Peek <span class="w"> </span>10ms<span class="w"> </span><span class="m">4</span>.76%<span class="w"> </span><span class="m">85</span>.71%<span class="w"> </span>10ms<span class="w"> </span><span class="m">4</span>.76%<span class="w"> </span>encoding/json.<span class="o">(</span>*encodeState<span class="o">)</span>.string <span class="w"> </span>10ms<span class="w"> </span><span class="m">4</span>.76%<span class="w"> </span><span class="m">90</span>.48%<span class="w"> </span>20ms<span class="w"> </span><span class="m">9</span>.52%<span class="w"> </span>main.<span class="o">(</span>*jsonReader<span class="o">)</span>.expectString <span class="w"> </span>10ms<span class="w"> </span><span class="m">4</span>.76%<span class="w"> </span><span class="m">95</span>.24%<span class="w"> </span>10ms<span class="w"> </span><span class="m">4</span>.76%<span class="w"> </span>main.<span class="o">(</span>*jsonReader<span class="o">)</span>.readByte <span class="w"> </span>10ms<span class="w"> </span><span class="m">4</span>.76%<span class="w"> </span><span class="m">100</span>%<span class="w"> </span>10ms<span class="w"> </span><span class="m">4</span>.76%<span class="w"> </span>reflect.Value.Type <span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span>%<span class="w"> </span><span class="m">100</span>%<span class="w"> </span>10ms<span class="w"> </span><span class="m">4</span>.76%<span class="w"> </span>encoding/json.<span class="o">(</span>*Encoder<span class="o">)</span>.Encode <span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span>%<span class="w"> </span><span class="m">100</span>%<span class="w"> </span>10ms<span class="w"> </span><span class="m">4</span>.76%<span class="w"> </span>encoding/json.<span class="o">(</span>*decodeState<span class="o">)</span>.literalStore <span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span>%<span class="w"> </span><span class="m">100</span>%<span class="w"> </span>10ms<span class="w"> </span><span class="m">4</span>.76%<span class="w"> </span>encoding/json.<span class="o">(</span>*decodeState<span class="o">)</span>.unmarshal <span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span>%<span class="w"> </span><span class="m">100</span>%<span class="w"> </span>10ms<span class="w"> </span><span class="m">4</span>.76%<span class="w"> </span>encoding/json.<span class="o">(</span>*decodeState<span class="o">)</span>.value </pre></div> <p>Nice, <code>syscall.Syscall</code> is no longer in the top 10. But <code>eatValue</code> is and we're still spending a bunch of time in <code>Peek</code>. We didn't try to stop calling <code>Peek</code> so much, we just cut down on calling <code>Discard</code>.</p> <p>List <code>eatValue</code>.</p> <div class="highlight"><pre><span></span><span class="o">(</span>pprof<span class="o">)</span><span class="w"> </span>list<span class="w"> </span>eatValue Total:<span class="w"> </span>210ms <span class="nv">ROUTINE</span><span class="w"> </span><span class="o">========================</span><span class="w"> </span>main.<span class="o">(</span>*jsonReader<span class="o">)</span>.eatValue<span class="w"> </span><span class="k">in</span><span class="w"> </span>/home/phil/tmp/jqgo/mainpeek.go <span class="w"> </span>100ms<span class="w"> </span>180ms<span class="w"> </span><span class="o">(</span>flat,<span class="w"> </span>cum<span class="o">)</span><span class="w"> </span><span class="m">85</span>.71%<span class="w"> </span>of<span class="w"> </span>Total <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">159</span>:<span class="w"> </span>err<span class="w"> </span>:<span class="o">=</span><span class="w"> </span>jr.eatWhitespace<span class="o">(</span>r<span class="o">)</span> <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">160</span>:<span class="w"> </span><span class="k">if</span><span class="w"> </span>err<span class="w"> </span>!<span class="o">=</span><span class="w"> </span>nil<span class="w"> </span><span class="o">{</span> <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">161</span>:<span class="w"> </span><span class="k">return</span><span class="w"> </span>err <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">162</span>:<span class="w"> </span><span class="o">}</span> <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">163</span>: <span class="w"> </span>.<span class="w"> </span>20ms<span class="w"> </span><span class="m">164</span>:<span class="w"> </span>ok,<span class="w"> </span>_,<span class="w"> </span>err<span class="w"> </span>:<span class="o">=</span><span class="w"> </span>jr.tryScalar<span class="o">(</span>r<span class="o">)</span> <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">165</span>:<span class="w"> </span><span class="k">if</span><span class="w"> </span>err<span class="w"> </span>!<span class="o">=</span><span class="w"> </span>nil<span class="w"> </span><span class="o">{</span> <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">166</span>:<span class="w"> </span><span class="k">return</span><span class="w"> </span>err <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">167</span>:<span class="w"> </span><span class="o">}</span> <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">168</span>: <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">169</span>:<span class="w"> </span>//<span class="w"> </span>It<span class="w"> </span>was<span class="w"> </span>a<span class="w"> </span>scalar,<span class="w"> </span>we<span class="s1">&#39;re done!</span> <span class="s1"> . . 170: if ok {</span> <span class="s1"> . . 171: return nil</span> <span class="s1"> . . 172: }</span> <span class="s1"> . . 173:</span> <span class="s1"> . . 174: // Otherwise it&#39;</span>s<span class="w"> </span>an<span class="w"> </span>array<span class="w"> </span>or<span class="w"> </span>object <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">175</span>:<span class="w"> </span>length<span class="w"> </span>:<span class="o">=</span><span class="w"> </span><span class="m">0</span> <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">176</span>:<span class="w"> </span>first<span class="w"> </span>:<span class="o">=</span><span class="w"> </span><span class="nb">true</span> <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">177</span>:<span class="w"> </span>var<span class="w"> </span>bs<span class="w"> </span><span class="o">[]</span>byte <span class="w"> </span>20ms<span class="w"> </span>20ms<span class="w"> </span><span class="m">178</span>:<span class="w"> </span><span class="k">for</span><span class="w"> </span>first<span class="w"> </span><span class="o">||</span><span class="w"> </span>len<span class="o">(</span>stack<span class="o">)</span><span class="w"> </span>&gt;<span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="o">{</span> <span class="w"> </span>10ms<span class="w"> </span>10ms<span class="w"> </span><span class="m">179</span>:<span class="w"> </span>length++ <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">180</span>:<span class="w"> </span><span class="nv">first</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">false</span> <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">181</span>: <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">182</span>:<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="o">{</span> <span class="w"> </span>20ms<span class="w"> </span>80ms<span class="w"> </span><span class="m">183</span>:<span class="w"> </span>bs,<span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>r.Peek<span class="o">(</span>length<span class="o">)</span> <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">184</span>:<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">==</span><span class="w"> </span>bufio.ErrBufferFull<span class="w"> </span><span class="o">{</span> <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">185</span>:<span class="w"> </span>_,<span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>r.Discard<span class="o">(</span>length<span class="w"> </span>-<span class="w"> </span><span class="m">1</span><span class="o">)</span> <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">186</span>:<span class="w"> </span><span class="k">if</span><span class="w"> </span>err<span class="w"> </span>!<span class="o">=</span><span class="w"> </span>nil<span class="w"> </span><span class="o">{</span> <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">187</span>:<span class="w"> </span><span class="k">return</span><span class="w"> </span>err <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">188</span>:<span class="w"> </span><span class="o">}</span> <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">189</span>: <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">190</span>:<span class="w"> </span><span class="nv">length</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span> <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">191</span>:<span class="w"> </span><span class="k">continue</span> <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">192</span>:<span class="w"> </span><span class="o">}</span> <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">193</span>:<span class="w"> </span><span class="k">if</span><span class="w"> </span>err<span class="w"> </span>!<span class="o">=</span><span class="w"> </span>nil<span class="w"> </span><span class="o">{</span> <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">194</span>:<span class="w"> </span><span class="k">return</span><span class="w"> </span>err <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">195</span>:<span class="w"> </span><span class="o">}</span> <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">196</span>: <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">197</span>:<span class="w"> </span><span class="k">break</span> <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">198</span>:<span class="w"> </span><span class="o">}</span> <span class="w"> </span>10ms<span class="w"> </span>10ms<span class="w"> </span><span class="m">199</span>:<span class="w"> </span>b<span class="w"> </span>:<span class="o">=</span><span class="w"> </span>bs<span class="o">[</span>length-1<span class="o">]</span> <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">200</span>: <span class="w"> </span>10ms<span class="w"> </span>10ms<span class="w"> </span><span class="m">201</span>:<span class="w"> </span><span class="k">if</span><span class="w"> </span>inString<span class="w"> </span><span class="o">{</span> <span class="w"> </span>10ms<span class="w"> </span>10ms<span class="w"> </span><span class="m">202</span>:<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nv">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s1">&#39;&quot;&#39;</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span>prev<span class="w"> </span>!<span class="o">=</span><span class="w"> </span><span class="s1">&#39;\\&#39;</span><span class="w"> </span><span class="o">{</span> <span class="w"> </span><span class="m">203</span>:<span class="w"> </span><span class="nv">inString</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">false</span> <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">204</span>:<span class="w"> </span><span class="o">}</span> <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">205</span>: <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">206</span>:<span class="w"> </span>//<span class="w"> </span>Two<span class="w"> </span><span class="se">\\</span>-es<span class="w"> </span>cancel<span class="w"> </span>eachother<span class="w"> </span>out <span class="w"> </span>20ms<span class="w"> </span>20ms<span class="w"> </span><span class="m">207</span>:<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nv">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s1">&#39;\\&#39;</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nv">prev</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s1">&#39;\\&#39;</span><span class="w"> </span><span class="o">{</span> <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">208</span>:<span class="w"> </span><span class="nv">prev</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>byte<span class="o">(</span><span class="m">0</span><span class="o">)</span> <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">209</span>:<span class="w"> </span><span class="o">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="o">{</span> <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">210</span>:<span class="w"> </span><span class="nv">prev</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>b <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">211</span>:<span class="w"> </span><span class="o">}</span> <span class="w"> </span>. </pre></div> <p>The bulk of time is spent in <code>Peek</code>. Let's list <code>Peek</code> again.</p> <div class="highlight"><pre><span></span><span class="o">(</span>pprof<span class="o">)</span><span class="w"> </span>list<span class="w"> </span>Peek Total:<span class="w"> </span>210ms <span class="nv">ROUTINE</span><span class="w"> </span><span class="o">========================</span><span class="w"> </span>bufio.<span class="o">(</span>*Reader<span class="o">)</span>.Peek<span class="w"> </span><span class="k">in</span><span class="w"> </span>/usr/local/go/src/bufio/bufio.go <span class="w"> </span>70ms<span class="w"> </span>70ms<span class="w"> </span><span class="o">(</span>flat,<span class="w"> </span>cum<span class="o">)</span><span class="w"> </span><span class="m">33</span>.33%<span class="w"> </span>of<span class="w"> </span>Total <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">130</span>://<span class="w"> </span>also<span class="w"> </span>returns<span class="w"> </span>an<span class="w"> </span>error<span class="w"> </span>explaining<span class="w"> </span>why<span class="w"> </span>the<span class="w"> </span><span class="nb">read</span><span class="w"> </span>is<span class="w"> </span>short.<span class="w"> </span>The<span class="w"> </span>error<span class="w"> </span>is <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">131</span>://<span class="w"> </span>ErrBufferFull<span class="w"> </span><span class="k">if</span><span class="w"> </span>n<span class="w"> </span>is<span class="w"> </span>larger<span class="w"> </span>than<span class="w"> </span>b<span class="err">&#39;</span>s<span class="w"> </span>buffer<span class="w"> </span>size. <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">132</span>:// <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">133</span>://<span class="w"> </span>Calling<span class="w"> </span>Peek<span class="w"> </span>prevents<span class="w"> </span>a<span class="w"> </span>UnreadByte<span class="w"> </span>or<span class="w"> </span>UnreadRune<span class="w"> </span>call<span class="w"> </span>from<span class="w"> </span>succeeding <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">134</span>://<span class="w"> </span><span class="k">until</span><span class="w"> </span>the<span class="w"> </span>next<span class="w"> </span><span class="nb">read</span><span class="w"> </span>operation. <span class="w"> </span>10ms<span class="w"> </span>10ms<span class="w"> </span><span class="m">135</span>:func<span class="w"> </span><span class="o">(</span>b<span class="w"> </span>*Reader<span class="o">)</span><span class="w"> </span>Peek<span class="o">(</span>n<span class="w"> </span>int<span class="o">)</span><span class="w"> </span><span class="o">([]</span>byte,<span class="w"> </span>error<span class="o">)</span><span class="w"> </span><span class="o">{</span> <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">136</span>:<span class="w"> </span><span class="k">if</span><span class="w"> </span>n<span class="w"> </span>&lt;<span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="o">{</span> <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">137</span>:<span class="w"> </span><span class="k">return</span><span class="w"> </span>nil,<span class="w"> </span>ErrNegativeCount <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">138</span>:<span class="w"> </span><span class="o">}</span> <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">139</span>: <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">140</span>:<span class="w"> </span>b.lastByte<span class="w"> </span><span class="o">=</span><span class="w"> </span>-1 <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">141</span>:<span class="w"> </span>b.lastRuneSize<span class="w"> </span><span class="o">=</span><span class="w"> </span>-1 <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">142</span>: <span class="w"> </span>10ms<span class="w"> </span>10ms<span class="w"> </span><span class="m">143</span>:<span class="w"> </span><span class="k">for</span><span class="w"> </span>b.w-b.r<span class="w"> </span>&lt;<span class="w"> </span>n<span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span>b.w-b.r<span class="w"> </span>&lt;<span class="w"> </span>len<span class="o">(</span>b.buf<span class="o">)</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span>b.err<span class="w"> </span><span class="o">==</span><span class="w"> </span>nil<span class="w"> </span><span class="o">{</span> <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">144</span>:<span class="w"> </span>b.fill<span class="o">()</span><span class="w"> </span>//<span class="w"> </span>b.w-b.r<span class="w"> </span>&lt;<span class="w"> </span>len<span class="o">(</span>b.buf<span class="o">)</span><span class="w"> </span><span class="o">=</span>&gt;<span class="w"> </span>buffer<span class="w"> </span>is<span class="w"> </span>not<span class="w"> </span>full <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">145</span>:<span class="w"> </span><span class="o">}</span> <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">146</span>: <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">147</span>:<span class="w"> </span><span class="k">if</span><span class="w"> </span>n<span class="w"> </span>&gt;<span class="w"> </span>len<span class="o">(</span>b.buf<span class="o">)</span><span class="w"> </span><span class="o">{</span> <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">148</span>:<span class="w"> </span><span class="k">return</span><span class="w"> </span>b.buf<span class="o">[</span>b.r:b.w<span class="o">]</span>,<span class="w"> </span>ErrBufferFull <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">149</span>:<span class="w"> </span><span class="o">}</span> <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">150</span>: <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">151</span>:<span class="w"> </span>//<span class="w"> </span><span class="m">0</span><span class="w"> </span>&lt;<span class="o">=</span><span class="w"> </span>n<span class="w"> </span>&lt;<span class="o">=</span><span class="w"> </span>len<span class="o">(</span>b.buf<span class="o">)</span> <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">152</span>:<span class="w"> </span>var<span class="w"> </span>err<span class="w"> </span>error <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">153</span>:<span class="w"> </span><span class="k">if</span><span class="w"> </span>avail<span class="w"> </span>:<span class="o">=</span><span class="w"> </span>b.w<span class="w"> </span>-<span class="w"> </span>b.r<span class="p">;</span><span class="w"> </span>avail<span class="w"> </span>&lt;<span class="w"> </span>n<span class="w"> </span><span class="o">{</span> <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">154</span>:<span class="w"> </span>//<span class="w"> </span>not<span class="w"> </span>enough<span class="w"> </span>data<span class="w"> </span><span class="k">in</span><span class="w"> </span>buffer <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">155</span>:<span class="w"> </span><span class="nv">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>avail <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">156</span>:<span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>b.readErr<span class="o">()</span> <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">157</span>:<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">==</span><span class="w"> </span>nil<span class="w"> </span><span class="o">{</span> <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">158</span>:<span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>ErrBufferFull <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">159</span>:<span class="w"> </span><span class="o">}</span> <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">160</span>:<span class="w"> </span><span class="o">}</span> <span class="w"> </span>50ms<span class="w"> </span>50ms<span class="w"> </span><span class="m">161</span>:<span class="w"> </span><span class="k">return</span><span class="w"> </span>b.buf<span class="o">[</span>b.r<span class="w"> </span>:<span class="w"> </span>b.r+n<span class="o">]</span>,<span class="w"> </span>err <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">162</span>:<span class="o">}</span> <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">163</span>: <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">164</span>://<span class="w"> </span>Discard<span class="w"> </span>skips<span class="w"> </span>the<span class="w"> </span>next<span class="w"> </span>n<span class="w"> </span>bytes,<span class="w"> </span>returning<span class="w"> </span>the<span class="w"> </span>number<span class="w"> </span>of<span class="w"> </span>bytes<span class="w"> </span>discarded. <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">165</span>:// <span class="w"> </span>.<span class="w"> </span>.<span class="w"> </span><span class="m">166</span>://<span class="w"> </span>If<span class="w"> </span>Discard<span class="w"> </span>skips<span class="w"> </span>fewer<span class="w"> </span>than<span class="w"> </span>n<span class="w"> </span>bytes,<span class="w"> </span>it<span class="w"> </span>also<span class="w"> </span>returns<span class="w"> </span>an<span class="w"> </span>error. </pre></div> <p>Well it's not really clear to me from this why we spend so much time slicing here.</p> <p>We might be able to use <code>Peek</code> much less if we kept our own FIFO queue of peeked-at bytes. But I don't feel like writing a correct, efficient FIFO queue (a ring buffer, basically) and maybe there are other aspects of this program we can look at. So let's give this train of thought a break.</p> <h3 id="memory-profiling">Memory profiling</h3><p>Let's change tactics entirely. Memory allocation tends to be expensive. Allocating in a loop is generally a bad idea. And this entire program is a loop. So let's try doing a memory profile instead of a CPU profile.</p> <p>Instead of <code>defer profile.Start().Stop()</code> we'll set <code>defer profile.Start(profile.MemProfile).Stop()</code>.</p> <p>Build, rerun and enter pprof with the <code>-alloc_space</code> flag. We want to see where memory is being allocated.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>build $<span class="w"> </span>cat<span class="w"> </span>large-file.json<span class="w"> </span><span class="p">|</span><span class="w"> </span>./jqgo<span class="w"> </span><span class="s1">&#39;.repo.url&#39;</span><span class="w"> </span>&gt;<span class="w"> </span>/dev/null <span class="m">2022</span>/07/11<span class="w"> </span><span class="m">03</span>:24:55<span class="w"> </span>profile:<span class="w"> </span>memory<span class="w"> </span>profiling<span class="w"> </span>enabled<span class="w"> </span><span class="o">(</span>rate<span class="w"> </span><span class="m">4096</span><span class="o">)</span>,<span class="w"> </span>/tmp/profile1407859643/mem.pprof <span class="m">2022</span>/07/11<span class="w"> </span><span class="m">03</span>:24:56<span class="w"> </span>profile:<span class="w"> </span>memory<span class="w"> </span>profiling<span class="w"> </span>disabled,<span class="w"> </span>/tmp/profile1407859643/mem.pprof $<span class="w"> </span>go<span class="w"> </span>tool<span class="w"> </span>pprof<span class="w"> </span>-alloc_objects<span class="w"> </span>/tmp/profile1407859643/mem.pprof File:<span class="w"> </span>jqgo Type:<span class="w"> </span>alloc_objects Time:<span class="w"> </span>Jul<span class="w"> </span><span class="m">11</span>,<span class="w"> </span><span class="m">2022</span><span class="w"> </span>at<span class="w"> </span><span class="m">3</span>:24am<span class="w"> </span><span class="o">(</span>UTC<span class="o">)</span> Entering<span class="w"> </span>interactive<span class="w"> </span>mode<span class="w"> </span><span class="o">(</span><span class="nb">type</span><span class="w"> </span><span class="s2">&quot;help&quot;</span><span class="w"> </span><span class="k">for</span><span class="w"> </span>commands,<span class="w"> </span><span class="s2">&quot;o&quot;</span><span class="w"> </span><span class="k">for</span><span class="w"> </span>options<span class="o">)</span> <span class="o">(</span>pprof<span class="o">)</span><span class="w"> </span>top10 Showing<span class="w"> </span>nodes<span class="w"> </span>accounting<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="m">365899</span>,<span class="w"> </span><span class="m">99</span>.95%<span class="w"> </span>of<span class="w"> </span><span class="m">366086</span><span class="w"> </span>total Dropped<span class="w"> </span><span class="m">24</span><span class="w"> </span>nodes<span class="w"> </span><span class="o">(</span>cum<span class="w"> </span>&lt;<span class="o">=</span><span class="w"> </span><span class="m">1830</span><span class="o">)</span> Showing<span class="w"> </span>top<span class="w"> </span><span class="m">10</span><span class="w"> </span>nodes<span class="w"> </span>out<span class="w"> </span>of<span class="w"> </span><span class="m">14</span> <span class="w"> </span>flat<span class="w"> </span>flat%<span class="w"> </span>sum%<span class="w"> </span>cum<span class="w"> </span>cum% <span class="w"> </span><span class="m">227585</span><span class="w"> </span><span class="m">62</span>.17%<span class="w"> </span><span class="m">62</span>.17%<span class="w"> </span><span class="m">262708</span><span class="w"> </span><span class="m">71</span>.76%<span class="w"> </span>main.<span class="o">(</span>*jsonReader<span class="o">)</span>.expectString <span class="w"> </span><span class="m">40945</span><span class="w"> </span><span class="m">11</span>.18%<span class="w"> </span><span class="m">73</span>.35%<span class="w"> </span><span class="m">40945</span><span class="w"> </span><span class="m">11</span>.18%<span class="w"> </span>main.<span class="o">(</span>*jsonReader<span class="o">)</span>.readByte <span class="w"> </span><span class="m">39500</span><span class="w"> </span><span class="m">10</span>.79%<span class="w"> </span><span class="m">84</span>.14%<span class="w"> </span><span class="m">252585</span><span class="w"> </span><span class="m">69</span>.00%<span class="w"> </span>main.<span class="o">(</span>*jsonReader<span class="o">)</span>.tryScalar <span class="w"> </span><span class="m">30009</span><span class="w"> </span><span class="m">8</span>.20%<span class="w"> </span><span class="m">92</span>.34%<span class="w"> </span><span class="m">41924</span><span class="w"> </span><span class="m">11</span>.45%<span class="w"> </span>main.<span class="o">(</span>*jsonReader<span class="o">)</span>.tryNumber <span class="w"> </span><span class="m">12055</span><span class="w"> </span><span class="m">3</span>.29%<span class="w"> </span><span class="m">95</span>.63%<span class="w"> </span><span class="m">215416</span><span class="w"> </span><span class="m">58</span>.84%<span class="w"> </span>main.<span class="o">(</span>*jsonReader<span class="o">)</span>.eatValue <span class="w"> </span><span class="m">7555</span><span class="w"> </span><span class="m">2</span>.06%<span class="w"> </span><span class="m">97</span>.70%<span class="w"> </span><span class="m">11915</span><span class="w"> </span><span class="m">3</span>.25%<span class="w"> </span>encoding/json.Unmarshal <span class="w"> </span><span class="m">4360</span><span class="w"> </span><span class="m">1</span>.19%<span class="w"> </span><span class="m">98</span>.89%<span class="w"> </span><span class="m">4360</span><span class="w"> </span><span class="m">1</span>.19%<span class="w"> </span>encoding/json.<span class="o">(</span>*decodeState<span class="o">)</span>.literalStore <span class="w"> </span><span class="m">3847</span><span class="w"> </span><span class="m">1</span>.05%<span class="w"> </span><span class="m">99</span>.94%<span class="w"> </span><span class="m">3847</span><span class="w"> </span><span class="m">1</span>.05%<span class="w"> </span>main.<span class="o">(</span>*jsonReader<span class="o">)</span>.expectIdentifier <span class="w"> </span><span class="m">43</span><span class="w"> </span><span class="m">0</span>.012%<span class="w"> </span><span class="m">99</span>.95%<span class="w"> </span><span class="m">365931</span><span class="w"> </span><span class="m">100</span>%<span class="w"> </span>runtime.main <span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span>%<span class="w"> </span><span class="m">99</span>.95%<span class="w"> </span><span class="m">4360</span><span class="w"> </span><span class="m">1</span>.19%<span class="w"> </span>encoding/json.<span class="o">(</span>*decodeState<span class="o">)</span>.unmarshal </pre></div> <p>And just like in the CPU profile we can list functions to see where the allocations happen in code. Let's list the biggest memory user here, <code>expectString</code>.</p> <div class="highlight"><pre><span></span><span class="p">(</span><span class="n">pprof</span><span class="p">)</span><span class="w"> </span><span class="n">list</span><span class="w"> </span><span class="n">expectString</span> <span class="n">Total</span><span class="p">:</span><span class="w"> </span><span class="mi">366086</span> <span class="n">ROUTINE</span><span class="w"> </span><span class="o">========================</span><span class="w"> </span><span class="n">main</span><span class="o">.</span><span class="p">(</span><span class="o">*</span><span class="n">jsonReader</span><span class="p">)</span><span class="o">.</span><span class="n">expectString</span><span class="w"> </span><span class="ow">in</span><span class="w"> </span><span class="o">/</span><span class="n">home</span><span class="o">/</span><span class="n">phil</span><span class="o">/</span><span class="n">tmp</span><span class="o">/</span><span class="n">jqgo</span><span class="o">/</span><span class="n">mainpeek</span><span class="o">.</span><span class="n">go</span> <span class="w"> </span><span class="mi">227585</span><span class="w"> </span><span class="mi">262708</span><span class="w"> </span><span class="p">(</span><span class="n">flat</span><span class="p">,</span><span class="w"> </span><span class="n">cum</span><span class="p">)</span><span class="w"> </span><span class="mf">71.76</span><span class="o">%</span><span class="w"> </span><span class="n">of</span><span class="w"> </span><span class="n">Total</span> <span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">58</span><span class="p">:</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">jr</span><span class="o">.</span><span class="n">eatWhitespace</span><span class="p">(</span><span class="n">r</span><span class="p">)</span> <span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">59</span><span class="p">:</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">60</span><span class="p">:</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s2">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">err</span> <span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">61</span><span class="p">:</span><span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">62</span><span class="p">:</span> <span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">4941</span><span class="w"> </span><span class="mi">63</span><span class="p">:</span><span class="w"> </span><span class="n">b</span><span class="p">,</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">jr</span><span class="o">.</span><span class="n">readByte</span><span class="p">(</span><span class="n">r</span><span class="p">)</span> <span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">64</span><span class="p">:</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">65</span><span class="p">:</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s2">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">err</span> <span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">66</span><span class="p">:</span><span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">67</span><span class="p">:</span> <span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">68</span><span class="p">:</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="s1">&#39;&quot;&#39;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">69</span><span class="p">:</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s2">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">fmt</span><span class="o">.</span><span class="n">Errorf</span><span class="p">(</span><span class="s2">&quot;Expected double quote to start string, got: &#39;</span><span class="si">%s</span><span class="s2">&#39;&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">string</span><span class="p">(</span><span class="n">b</span><span class="p">))</span> <span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">70</span><span class="p">:</span><span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">71</span><span class="p">:</span> <span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">72</span><span class="p">:</span><span class="w"> </span><span class="k">var</span><span class="w"> </span><span class="n">prev</span><span class="w"> </span><span class="n">byte</span> <span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">73</span><span class="p">:</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">30182</span><span class="w"> </span><span class="mi">74</span><span class="p">:</span><span class="w"> </span><span class="n">b</span><span class="p">,</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">jr</span><span class="o">.</span><span class="n">readByte</span><span class="p">(</span><span class="n">r</span><span class="p">)</span> <span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">75</span><span class="p">:</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">76</span><span class="p">:</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s2">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">err</span> <span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">77</span><span class="p">:</span><span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">78</span><span class="p">:</span> <span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">79</span><span class="p">:</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s1">&#39;</span><span class="se">\\</span><span class="s1">&#39;</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="n">prev</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s1">&#39;</span><span class="se">\\</span><span class="s1">&#39;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">80</span><span class="p">:</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Just</span><span class="w"> </span><span class="n">skip</span> <span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">81</span><span class="p">:</span><span class="w"> </span><span class="n">prev</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">byte</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">82</span><span class="p">:</span><span class="w"> </span><span class="k">continue</span> <span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">83</span><span class="p">:</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s1">&#39;&quot;&#39;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">84</span><span class="p">:</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Overwrite</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">escaped</span><span class="w"> </span><span class="n">double</span><span class="w"> </span><span class="n">quote</span> <span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">85</span><span class="p">:</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">prev</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s1">&#39;</span><span class="se">\\</span><span class="s1">&#39;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">86</span><span class="p">:</span><span class="w"> </span><span class="n">s</span><span class="p">[</span><span class="n">len</span><span class="p">(</span><span class="n">s</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;&quot;&#39;</span> <span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">87</span><span class="p">:</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">88</span><span class="p">:</span><span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Otherwise</span><span class="w"> </span><span class="n">it</span><span class="s1">&#39;s the actual end</span> <span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">89</span><span class="p">:</span><span class="w"> </span><span class="k">break</span> <span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">90</span><span class="p">:</span><span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">91</span><span class="p">:</span><span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">92</span><span class="p">:</span> <span class="w"> </span><span class="mi">146302</span><span class="w"> </span><span class="mi">146302</span><span class="w"> </span><span class="mi">93</span><span class="p">:</span><span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">append</span><span class="p">(</span><span class="n">s</span><span class="p">,</span><span class="w"> </span><span class="n">b</span><span class="p">)</span> <span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">94</span><span class="p">:</span><span class="w"> </span><span class="n">prev</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">b</span> <span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">95</span><span class="p">:</span><span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">96</span><span class="p">:</span> <span class="w"> </span><span class="mi">81283</span><span class="w"> </span><span class="mi">81283</span><span class="w"> </span><span class="mi">97</span><span class="p">:</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">string</span><span class="p">(</span><span class="n">s</span><span class="p">),</span><span class="w"> </span><span class="n">nil</span> <span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">98</span><span class="p">:}</span> <span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">99</span><span class="p">:</span> <span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">100</span><span class="p">:</span><span class="k">func</span><span class="w"> </span><span class="p">(</span><span class="n">jr</span><span class="w"> </span><span class="o">*</span><span class="n">jsonReader</span><span class="p">)</span><span class="w"> </span><span class="n">expectIdentifier</span><span class="p">(</span><span class="n">r</span><span class="w"> </span><span class="o">*</span><span class="n">bufio</span><span class="o">.</span><span class="n">Reader</span><span class="p">,</span><span class="w"> </span><span class="n">ident</span><span class="w"> </span><span class="n">string</span><span class="p">,</span><span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="n">any</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="n">any</span><span class="p">,</span><span class="w"> </span><span class="n">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">101</span><span class="p">:</span><span class="w"> </span><span class="k">var</span><span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="p">[]</span><span class="n">byte</span> <span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="o">.</span><span class="w"> </span><span class="mi">102</span><span class="p">:</span> </pre></div> <p>And the biggest offender is growing the string! The good thing is that growing this string can be amortized because we can share the underlying string memory across calls on the <code>jsonResponse</code> struct. This way, <code>expectString</code> only needs to grow the string when it actually sees a bigger string than we've already seen.</p> <p>The builtin <a href="https://pkg.go.dev/bytes#Buffer">bytes.Buffer</a> type does exactly this. We can put a <code>bytes.Buffer</code> on the <code>jsonResponse</code> struct because this code isn't multithreaded and because <code>expectString</code> doesn't call itself.</p> <div class="highlight"><pre><span></span><span class="err">@@</span><span class="w"> </span><span class="o">-</span><span class="mi">2</span><span class="p">,</span><span class="mi">6</span><span class="w"> </span><span class="o">+</span><span class="mi">2</span><span class="p">,</span><span class="mi">7</span><span class="w"> </span><span class="err">@@</span> <span class="w"> </span><span class="kn">import</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="s">&quot;bufio&quot;</span> <span class="o">+</span><span class="w"> </span><span class="s">&quot;bytes&quot;</span> <span class="w"> </span><span class="s">&quot;encoding/json&quot;</span> <span class="w"> </span><span class="s">&quot;fmt&quot;</span> <span class="w"> </span><span class="s">&quot;io&quot;</span> <span class="err">@@</span><span class="w"> </span><span class="o">-</span><span class="mi">13</span><span class="p">,</span><span class="mi">6</span><span class="w"> </span><span class="o">+</span><span class="mi">14</span><span class="p">,</span><span class="mi">8</span><span class="w"> </span><span class="err">@@</span> <span class="w"> </span><span class="kd">type</span><span class="w"> </span><span class="nx">jsonReader</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">read</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span> <span class="o">+</span> <span class="o">+</span><span class="w"> </span><span class="nx">expectString_buffer</span><span class="w"> </span><span class="nx">bytes</span><span class="p">.</span><span class="nx">Buffer</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">jr</span><span class="w"> </span><span class="o">*</span><span class="nx">jsonReader</span><span class="p">)</span><span class="w"> </span><span class="nx">reset</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="err">@@</span><span class="w"> </span><span class="o">-</span><span class="mi">51</span><span class="p">,</span><span class="mi">7</span><span class="w"> </span><span class="o">+</span><span class="mi">54</span><span class="p">,</span><span class="mi">7</span><span class="w"> </span><span class="err">@@</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">jr</span><span class="w"> </span><span class="o">*</span><span class="nx">jsonReader</span><span class="p">)</span><span class="w"> </span><span class="nx">expectString</span><span class="p">(</span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">bufio</span><span class="p">.</span><span class="nx">Reader</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="o">-</span><span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span> <span class="o">+</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">expectString_buffer</span><span class="p">.</span><span class="nx">Reset</span><span class="p">()</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">eatWhitespace</span><span class="p">(</span><span class="nx">r</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="err">@@</span><span class="w"> </span><span class="o">-</span><span class="mi">81</span><span class="p">,</span><span class="mi">18</span><span class="w"> </span><span class="o">+</span><span class="mi">84</span><span class="p">,</span><span class="mi">18</span><span class="w"> </span><span class="err">@@</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;&quot;&#39;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Overwrite the escaped double quote</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">prev</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;\\&#39;</span><span class="w"> </span><span class="p">{</span> <span class="o">-</span><span class="w"> </span><span class="nx">s</span><span class="p">[</span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="sc">&#39;&quot;&#39;</span> <span class="o">+</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">expectString_buffer</span><span class="p">.</span><span class="nx">Bytes</span><span class="p">()[</span><span class="nx">jr</span><span class="p">.</span><span class="nx">expectString_buffer</span><span class="p">.</span><span class="nx">Len</span><span class="p">()</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="sc">&#39;&quot;&#39;</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Otherwise it&#39;s the actual end</span> <span class="w"> </span><span class="k">break</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="o">-</span><span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="p">)</span> <span class="o">+</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">expectString_buffer</span><span class="p">.</span><span class="nx">WriteByte</span><span class="p">(</span><span class="nx">b</span><span class="p">)</span> <span class="w"> </span><span class="nx">prev</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">b</span> <span class="w"> </span><span class="p">}</span> <span class="o">-</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">s</span><span class="p">),</span><span class="w"> </span><span class="kc">nil</span> <span class="o">+</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">jr</span><span class="p">.</span><span class="nx">expectString_buffer</span><span class="p">.</span><span class="nx">String</span><span class="p">(),</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p class="note"> Or instead of sharing memory on the struct, maybe this would be a good place to use <a href="https://pkg.go.dev/sync#Pool">sync.Pool</a>? </p><p>Disable <code>pkg/profile</code>, build and rerun with hyperfine.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>hyperfine<span class="w"> </span>--warmup<span class="w"> </span><span class="m">2</span><span class="w"> </span><span class="se">\</span> <span class="w"> </span><span class="s2">&quot;cat large-file.json | ./control/control &#39;.repo.url&#39; &gt; control.test&quot;</span><span class="w"> </span><span class="se">\</span> <span class="w"> </span><span class="s2">&quot;cat large-file.json | ./jqgo &#39;.repo.url&#39; &gt; jqgo.test&quot;</span><span class="w"> </span><span class="se">\</span> <span class="w"> </span><span class="s2">&quot;cat large-file.json | jq &#39;.repo.url&#39; &gt; jq.test&quot;</span> Benchmark<span class="w"> </span><span class="m">1</span>:<span class="w"> </span>cat<span class="w"> </span>large-file.json<span class="w"> </span><span class="p">|</span><span class="w"> </span>./control/control<span class="w"> </span><span class="s1">&#39;.repo.url&#39;</span><span class="w"> </span>&gt;<span class="w"> </span>control.test <span class="w"> </span>Time<span class="w"> </span><span class="o">(</span>mean<span class="w"> </span>±<span class="w"> </span>σ<span class="o">)</span>:<span class="w"> </span><span class="m">307</span>.2<span class="w"> </span>ms<span class="w"> </span>±<span class="w"> </span><span class="m">10</span>.8<span class="w"> </span>ms<span class="w"> </span><span class="o">[</span>User:<span class="w"> </span><span class="m">292</span>.8<span class="w"> </span>ms,<span class="w"> </span>System:<span class="w"> </span><span class="m">49</span>.4<span class="w"> </span>ms<span class="o">]</span> <span class="w"> </span>Range<span class="w"> </span><span class="o">(</span>min<span class="w"> </span>…<span class="w"> </span>max<span class="o">)</span>:<span class="w"> </span><span class="m">296</span>.5<span class="w"> </span>ms<span class="w"> </span>…<span class="w"> </span><span class="m">326</span>.2<span class="w"> </span>ms<span class="w"> </span><span class="m">10</span><span class="w"> </span>runs Benchmark<span class="w"> </span><span class="m">2</span>:<span class="w"> </span>cat<span class="w"> </span>large-file.json<span class="w"> </span><span class="p">|</span><span class="w"> </span>./jqgo<span class="w"> </span><span class="s1">&#39;.repo.url&#39;</span><span class="w"> </span>&gt;<span class="w"> </span>jqgo.test <span class="w"> </span>Time<span class="w"> </span><span class="o">(</span>mean<span class="w"> </span>±<span class="w"> </span>σ<span class="o">)</span>:<span class="w"> </span><span class="m">210</span>.8<span class="w"> </span>ms<span class="w"> </span>±<span class="w"> </span><span class="m">2</span>.2<span class="w"> </span>ms<span class="w"> </span><span class="o">[</span>User:<span class="w"> </span><span class="m">185</span>.4<span class="w"> </span>ms,<span class="w"> </span>System:<span class="w"> </span><span class="m">44</span>.9<span class="w"> </span>ms<span class="o">]</span> <span class="w"> </span>Range<span class="w"> </span><span class="o">(</span>min<span class="w"> </span>…<span class="w"> </span>max<span class="o">)</span>:<span class="w"> </span><span class="m">209</span>.1<span class="w"> </span>ms<span class="w"> </span>…<span class="w"> </span><span class="m">216</span>.8<span class="w"> </span>ms<span class="w"> </span><span class="m">14</span><span class="w"> </span>runs Benchmark<span class="w"> </span><span class="m">3</span>:<span class="w"> </span>cat<span class="w"> </span>large-file.json<span class="w"> </span><span class="p">|</span><span class="w"> </span>jq<span class="w"> </span><span class="s1">&#39;.repo.url&#39;</span><span class="w"> </span>&gt;<span class="w"> </span>jq.test <span class="w"> </span>Time<span class="w"> </span><span class="o">(</span>mean<span class="w"> </span>±<span class="w"> </span>σ<span class="o">)</span>:<span class="w"> </span><span class="m">356</span>.1<span class="w"> </span>ms<span class="w"> </span>±<span class="w"> </span><span class="m">2</span>.6<span class="w"> </span>ms<span class="w"> </span><span class="o">[</span>User:<span class="w"> </span><span class="m">349</span>.1<span class="w"> </span>ms,<span class="w"> </span>System:<span class="w"> </span><span class="m">26</span>.9<span class="w"> </span>ms<span class="o">]</span> <span class="w"> </span>Range<span class="w"> </span><span class="o">(</span>min<span class="w"> </span>…<span class="w"> </span>max<span class="o">)</span>:<span class="w"> </span><span class="m">354</span>.1<span class="w"> </span>ms<span class="w"> </span>…<span class="w"> </span><span class="m">362</span>.9<span class="w"> </span>ms<span class="w"> </span><span class="m">10</span><span class="w"> </span>runs Summary <span class="w"> </span><span class="s1">&#39;cat large-file.json | ./jqgo &#39;</span>.repo.url<span class="s1">&#39; &gt; jqgo.test&#39;</span><span class="w"> </span>ran <span class="w"> </span><span class="m">1</span>.46<span class="w"> </span>±<span class="w"> </span><span class="m">0</span>.05<span class="w"> </span><span class="nb">times</span><span class="w"> </span>faster<span class="w"> </span>than<span class="w"> </span><span class="s1">&#39;cat large-file.json | ./control/control &#39;</span>.repo.url<span class="s1">&#39; &gt; control.test&#39;</span> <span class="w"> </span><span class="m">1</span>.69<span class="w"> </span>±<span class="w"> </span><span class="m">0</span>.02<span class="w"> </span><span class="nb">times</span><span class="w"> </span>faster<span class="w"> </span>than<span class="w"> </span><span class="s1">&#39;cat large-file.json | jq &#39;</span>.repo.url<span class="s1">&#39; &gt; jq.test&#39;</span> </pre></div> <p>And we've shaved another 20ms off. That's not bad!</p> <h3 id="coming-to-a-close">Coming to a close</h3><p>There is more we could do but this is a long post already.</p> <p>For example, in the project repo I also built a <a href="https://github.com/eatonphil/jqgo/blob/main/vector.go">generic vector type</a> with a pop operation that is used for the stack in the <code>eatValue</code> function. It is shared on the <code>jsonReader</code> instance like the <code>expectString</code> buffer. This ended up shaving another 20ms. And I also got rid of most conversions from <code>[]byte</code> to <code>string</code> (which is an expensive allocation you may notice listed as <code>bytes.String()</code> in the <code>top10</code> of <code>-alloc_objects</code> if you run the profiler again now.)</p> <p>But hopefully you're getting the gist of how you might investigate CPU and memory usage. For me it's still a lot of poking around and trying different things. But after a few years of trying to get better at profiling Go programs I think I'm starting to get the hang of it.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I wrote a new blog post on implementing a simple jq clone from scratch in Go. This post explores partial/fuzzy parsing again and finishes with my approach to debugging memory/CPU usage in Go programs. It&#39;s a bit of a long post but hopefully worthwhile! :)<a href="https://t.co/DxilIVaUBa">https://t.co/DxilIVaUBa</a> <a href="https://t.co/as3Sr5I2G0">pic.twitter.com/as3Sr5I2G0</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1546470283334270977?ref_src=twsrc%5Etfw">July 11, 2022</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/implementing-a-jq-clone-in-go.htmlSun, 10 Jul 2022 00:00:00 +0000One year as a solo dev building open-source data tools without fundinghttp://notes.eatonphil.com/2022-06-11-year-in-review.html<head> <meta http-equiv="refresh" content="4;URL='https://datastation.multiprocess.io/blog/2022-06-11-year-in-review.html'" /> </head><p>This is an external post of mine. Click <a href="https://datastation.multiprocess.io/blog/2022-06-11-year-in-review.html">here</a> if you are not redirected.</p> http://notes.eatonphil.com/2022-06-11-year-in-review.htmlFri, 10 Jun 2022 00:00:00 +0000Let's build a distributed Postgres proof of concepthttp://notes.eatonphil.com/distributed-postgres.html<p>What is CockroachDB under the hood? Take a look at <a href="https://github.com/cockroachdb/cockroach/blob/master/go.mod">its go.mod</a> and notice a number of dependencies that do a lot of work: <a href="https://github.com/jackc/pgproto3">a PostgreSQL wire protocol implementation</a>, <a href="https://github.com/cockroachdb/pebble">a storage layer</a>, <a href="https://github.com/etcd-io/etcd">a Raft implementation for distributed consensus</a>. And not part of go.mod but still building on 3rd party code, <a href="https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/parser/sql.y">PostgreSQL's grammar definition</a>.</p> <p>To be <em>absurdly</em> reductionist, CockroachDB is just the glue around these libraries. With that reductionist mindset, let's try building a distributed Postgres proof of concept ourselves! We'll use only four major external libraries: for parsing SQL, handling Postgres's wire protocol, handling Raft, and handling the storage of table metadata and rows themselves.</p> <p class="note"> For a not-reductionist understanding of the CockroachDB internals, I recommend following the excellent <a href="https://www.cockroachlabs.com/blog/">Cockroach Engineering blog</a> and <a href="https://www.twitch.tv/large__data__bank">Jordan Lewis's Hacking CockroachDB Twitch stream</a>. </p><p>By the end of this post, in around 600 lines of code, we'll have a distributed "Postgres implementation" that will accept writes (<code>CREATE TABLE</code>, <code>INSERT</code>) on the leader and accept reads (<code>SELECT</code>) on any node. All nodes will contain the same data.</p> <p>Here is a sample interaction against the leader:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>psql<span class="w"> </span>-h<span class="w"> </span>localhost<span class="w"> </span>-p<span class="w"> </span><span class="m">6000</span> psql<span class="w"> </span><span class="o">(</span><span class="m">13</span>.4,<span class="w"> </span>server<span class="w"> </span><span class="m">0</span>.0.0<span class="o">)</span> Type<span class="w"> </span><span class="s2">&quot;help&quot;</span><span class="w"> </span><span class="k">for</span><span class="w"> </span>help. <span class="nv">phil</span><span class="o">=</span>&gt;<span class="w"> </span>create<span class="w"> </span>table<span class="w"> </span>x<span class="w"> </span><span class="o">(</span>age<span class="w"> </span>int,<span class="w"> </span>name<span class="w"> </span>text<span class="o">)</span><span class="p">;</span> CREATE<span class="w"> </span>ok <span class="nv">phil</span><span class="o">=</span>&gt;<span class="w"> </span>insert<span class="w"> </span>into<span class="w"> </span>x<span class="w"> </span>values<span class="o">(</span><span class="m">14</span>,<span class="w"> </span><span class="s1">&#39;garry&#39;</span><span class="o">)</span>,<span class="w"> </span><span class="o">(</span><span class="m">20</span>,<span class="w"> </span><span class="s1">&#39;ted&#39;</span><span class="o">)</span><span class="p">;</span> could<span class="w"> </span>not<span class="w"> </span>interpret<span class="w"> </span>result<span class="w"> </span>from<span class="w"> </span>server:<span class="w"> </span>INSERT<span class="w"> </span>ok INSERT<span class="w"> </span>ok <span class="nv">phil</span><span class="o">=</span>&gt;<span class="w"> </span><span class="k">select</span><span class="w"> </span>name,<span class="w"> </span>age<span class="w"> </span>from<span class="w"> </span>x<span class="p">;</span> <span class="w"> </span>name<span class="w"> </span><span class="p">|</span><span class="w"> </span>age<span class="w"> </span> ---------+----- <span class="w"> </span><span class="s2">&quot;garry&quot;</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">14</span> <span class="w"> </span><span class="s2">&quot;ted&quot;</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">20</span> <span class="o">(</span><span class="m">2</span><span class="w"> </span>rows<span class="o">)</span> </pre></div> <p>And against a follower (note the different port):</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>psql<span class="w"> </span>-h<span class="w"> </span><span class="m">127</span>.0.0.1<span class="w"> </span>-p<span class="w"> </span><span class="m">6001</span> psql<span class="w"> </span><span class="o">(</span><span class="m">13</span>.4,<span class="w"> </span>server<span class="w"> </span><span class="m">0</span>.0.0<span class="o">)</span> Type<span class="w"> </span><span class="s2">&quot;help&quot;</span><span class="w"> </span><span class="k">for</span><span class="w"> </span>help. <span class="nv">phil</span><span class="o">=</span>&gt;<span class="w"> </span><span class="k">select</span><span class="w"> </span>age,<span class="w"> </span>name<span class="w"> </span>from<span class="w"> </span>x<span class="p">;</span> <span class="w"> </span>age<span class="w"> </span><span class="p">|</span><span class="w"> </span>name -----+--------- <span class="w"> </span><span class="m">20</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="s2">&quot;ted&quot;</span> <span class="w"> </span><span class="m">14</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="s2">&quot;garry&quot;</span> <span class="o">(</span><span class="m">2</span><span class="w"> </span>rows<span class="o">)</span> </pre></div> <p>All code for this post is <a href="https://github.com/eatonphil/waterbugdb">available on Github in the fondly named WaterbugDB repo</a>.</p> <h3 id="plan-of-attack">Plan of attack</h3><p>Influenced by <a href="https://youtu.be/rqO9PtBkiSQ?t=2332">Philip O'Toole's talk on rqlite at Hacker Nights</a> we'll have a Postgres wire protocol server in front. As it receives queries it will respond immediately to <code>SELECT</code>s. Otherwise for <code>CREATE TABLE</code>s and <code>INSERT</code>s it will send the entire query string to the Raft cluster. Each process that is part of the Raft cluster will implement the appropriate functions for handling Raft messages. In this case the messages will just be to create a table or insert data.</p> <p>So every running process will run a Postgres wire protocol server, a Raft server, and an HTTP server that you'll see is an implementation detail about how processes join to the same Raft cluster.</p> <p>Every running process will have its own directory for storing data.</p> <h3 id="raft">Raft</h3><p>There is likely a difference between Raft, the paper, and Raft, the implementations. When I refer to Raft in the rest of this post I'm going to be referring to an implementation.</p> <p>And although CockroachDB use's <a href="https://github.com/etcd-io/etcd">etcd's Raft implementation</a>, I didn't realize that when I started building this project. I used <a href="https://pkg.go.dev/github.com/hashicorp/raft">Hashicorp's Raft implementation</a>.</p> <p>Raft allows us to reliably keep multiple nodes in sync with a log of messages. Each node in the Raft cluster implements a finite state machine (FSM) with three operations: apply, snapshot, and restore. Our finite state machine will embed a postgres engine we'll build out after this to handle query execution.</p> <div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span> <span class="kn">import</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="s">&quot;bytes&quot;</span> <span class="w"> </span><span class="s">&quot;encoding/json&quot;</span> <span class="w"> </span><span class="s">&quot;fmt&quot;</span> <span class="w"> </span><span class="s">&quot;io&quot;</span> <span class="w"> </span><span class="s">&quot;log&quot;</span> <span class="w"> </span><span class="s">&quot;net&quot;</span> <span class="w"> </span><span class="s">&quot;net/http&quot;</span> <span class="w"> </span><span class="s">&quot;os&quot;</span> <span class="w"> </span><span class="s">&quot;path&quot;</span> <span class="w"> </span><span class="s">&quot;strings&quot;</span> <span class="w"> </span><span class="s">&quot;time&quot;</span> <span class="w"> </span><span class="s">&quot;github.com/google/uuid&quot;</span> <span class="w"> </span><span class="s">&quot;github.com/hashicorp/raft&quot;</span> <span class="w"> </span><span class="s">&quot;github.com/hashicorp/raft-boltdb&quot;</span> <span class="w"> </span><span class="s">&quot;github.com/jackc/pgproto3/v2&quot;</span> <span class="w"> </span><span class="nx">pgquery</span><span class="w"> </span><span class="s">&quot;github.com/pganalyze/pg_query_go/v2&quot;</span> <span class="w"> </span><span class="nx">bolt</span><span class="w"> </span><span class="s">&quot;go.etcd.io/bbolt&quot;</span> <span class="p">)</span> <span class="kd">type</span><span class="w"> </span><span class="nx">pgFsm</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">pe</span><span class="w"> </span><span class="o">*</span><span class="nx">pgEngine</span> <span class="p">}</span> </pre></div> <p>From what I understand, the snapshot operation allows Raft to truncate logs. It is used in conjuction with restoring. On startup if there is a snapshot, restore is called so you can load the snapshot. Then afterwards all logs not yet snapshotted are replayed through the apply operation.</p> <p>To keep this implementation simple we'll just fail all snapshots so restore will never be called and all logs will be replayed every time on startup through the apply operation. This is of course inefficient but it keeps the code simpler.</p> <p>When we write the startup code we'll need to delete the database so that these apply calls happen fresh.</p> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">snapshotNoop</span><span class="w"> </span><span class="kd">struct</span><span class="p">{}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">sn</span><span class="w"> </span><span class="nx">snapshotNoop</span><span class="p">)</span><span class="w"> </span><span class="nx">Persist</span><span class="p">(</span><span class="nx">sink</span><span class="w"> </span><span class="nx">raft</span><span class="p">.</span><span class="nx">SnapshotSink</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">sink</span><span class="p">.</span><span class="nx">Cancel</span><span class="p">()</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">sn</span><span class="w"> </span><span class="nx">snapshotNoop</span><span class="p">)</span><span class="w"> </span><span class="nx">Release</span><span class="p">()</span><span class="w"> </span><span class="p">{}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">pf</span><span class="w"> </span><span class="o">*</span><span class="nx">pgFsm</span><span class="p">)</span><span class="w"> </span><span class="nx">Snapshot</span><span class="p">()</span><span class="w"> </span><span class="p">(</span><span class="nx">raft</span><span class="p">.</span><span class="nx">FSMSnapshot</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">snapshotNoop</span><span class="p">{},</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">pf</span><span class="w"> </span><span class="o">*</span><span class="nx">pgFsm</span><span class="p">)</span><span class="w"> </span><span class="nx">Restore</span><span class="p">(</span><span class="nx">rc</span><span class="w"> </span><span class="nx">io</span><span class="p">.</span><span class="nx">ReadCloser</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Nothing to restore&quot;</span><span class="p">)</span> <span class="p">}</span> </pre></div> <p>Finally, applying is receiving a single message and applying it for the node. In this project the message will be a <code>CREATE TABLE</code> or <code>INSERT</code> query. So we'll parse the query and pass it to the postgres engine for execution.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">pf</span><span class="w"> </span><span class="o">*</span><span class="nx">pgFsm</span><span class="p">)</span><span class="w"> </span><span class="nx">Apply</span><span class="p">(</span><span class="nx">log</span><span class="w"> </span><span class="o">*</span><span class="nx">raft</span><span class="p">.</span><span class="nx">Log</span><span class="p">)</span><span class="w"> </span><span class="kd">interface</span><span class="p">{}</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Type</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">raft</span><span class="p">.</span><span class="nx">LogCommand</span><span class="p">:</span> <span class="w"> </span><span class="nx">ast</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">pgquery</span><span class="p">.</span><span class="nx">Parse</span><span class="p">(</span><span class="nb">string</span><span class="p">(</span><span class="nx">log</span><span class="p">.</span><span class="nx">Data</span><span class="p">))</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Could not parse payload: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">))</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">pf</span><span class="p">.</span><span class="nx">pe</span><span class="p">.</span><span class="nx">execute</span><span class="p">(</span><span class="nx">ast</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">default</span><span class="p">:</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Unknown raft log type: %#v&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Type</span><span class="p">))</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <p>Panic-ing here is actually the <a href="https://github.com/hashicorp/raft/issues/307">advised behavior</a>.</p> <h4 id="raft-server">Raft server</h4><p>Now we can set up the actual Raft server and pass an instance of this FSM. This is a bunch of boilerplate that would matter in production installs but for us basically we just need to tell Raft where to run and how to store its own internal data, including its all-important message log.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">setupRaft</span><span class="p">(</span><span class="nx">dir</span><span class="p">,</span><span class="w"> </span><span class="nx">nodeId</span><span class="p">,</span><span class="w"> </span><span class="nx">raftAddress</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">pf</span><span class="w"> </span><span class="o">*</span><span class="nx">pgFsm</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">raft</span><span class="p">.</span><span class="nx">Raft</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">MkdirAll</span><span class="p">(</span><span class="nx">dir</span><span class="p">,</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">ModePerm</span><span class="p">)</span> <span class="w"> </span><span class="nx">store</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">raftboltdb</span><span class="p">.</span><span class="nx">NewBoltStore</span><span class="p">(</span><span class="nx">path</span><span class="p">.</span><span class="nx">Join</span><span class="p">(</span><span class="nx">dir</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;bolt&quot;</span><span class="p">))</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Could not create bolt store: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">snapshots</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">raft</span><span class="p">.</span><span class="nx">NewFileSnapshotStore</span><span class="p">(</span><span class="nx">path</span><span class="p">.</span><span class="nx">Join</span><span class="p">(</span><span class="nx">dir</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;snapshot&quot;</span><span class="p">),</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Stderr</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Could not create snapshot store: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">tcpAddr</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">net</span><span class="p">.</span><span class="nx">ResolveTCPAddr</span><span class="p">(</span><span class="s">&quot;tcp&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">raftAddress</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Could not resolve address: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">transport</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">raft</span><span class="p">.</span><span class="nx">NewTCPTransport</span><span class="p">(</span><span class="nx">raftAddress</span><span class="p">,</span><span class="w"> </span><span class="nx">tcpAddr</span><span class="p">,</span><span class="w"> </span><span class="mi">10</span><span class="p">,</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Second</span><span class="o">*</span><span class="mi">10</span><span class="p">,</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Stderr</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Could not create tcp transport: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">raftCfg</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">raft</span><span class="p">.</span><span class="nx">DefaultConfig</span><span class="p">()</span> <span class="w"> </span><span class="nx">raftCfg</span><span class="p">.</span><span class="nx">LocalID</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">raft</span><span class="p">.</span><span class="nx">ServerID</span><span class="p">(</span><span class="nx">nodeId</span><span class="p">)</span> <span class="w"> </span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">raft</span><span class="p">.</span><span class="nx">NewRaft</span><span class="p">(</span><span class="nx">raftCfg</span><span class="p">,</span><span class="w"> </span><span class="nx">pf</span><span class="p">,</span><span class="w"> </span><span class="nx">store</span><span class="p">,</span><span class="w"> </span><span class="nx">store</span><span class="p">,</span><span class="w"> </span><span class="nx">snapshots</span><span class="p">,</span><span class="w"> </span><span class="nx">transport</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Could not create raft instance: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Cluster consists of unjoined leaders. Picking a leader and</span> <span class="w"> </span><span class="c1">// creating a real cluster is done manually after startup.</span> <span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">BootstrapCluster</span><span class="p">(</span><span class="nx">raft</span><span class="p">.</span><span class="nx">Configuration</span><span class="p">{</span> <span class="w"> </span><span class="nx">Servers</span><span class="p">:</span><span class="w"> </span><span class="p">[]</span><span class="nx">raft</span><span class="p">.</span><span class="nx">Server</span><span class="p">{</span> <span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">ID</span><span class="p">:</span><span class="w"> </span><span class="nx">raft</span><span class="p">.</span><span class="nx">ServerID</span><span class="p">(</span><span class="nx">nodeId</span><span class="p">),</span> <span class="w"> </span><span class="nx">Address</span><span class="p">:</span><span class="w"> </span><span class="nx">transport</span><span class="p">.</span><span class="nx">LocalAddr</span><span class="p">(),</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">})</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <p>Every instance of this process will run this and will start off as a leader in a new cluster. We'll expose an HTTP server that allows a leader to talk to other leaders to tell them to stop leading and follow it. This HTTP endpoint in the HTTP server is how we'll get from N process with N leaders and N clusters to N processes with 1 leader and 1 cluster.</p> <p>That's basically it for the core Raft bits. So let's build out that HTTP server and follow endpoint.</p> <h3 id="http-follow-endpoint">HTTP follow endpoint</h3><p>Our HTTP server will have just one endpoint that tells the process (a) to contact another process (b) so that process (b) joins the process (a) cluster.</p> <p>The HTTP server will need to have the process (a)'s Raft instance to be able to start this join action. And in order for Raft to know how to contact the process (b) we'll need to tell it both the process (b)'s unique Raft node id (we'll give it a unique id ourselves when we start the process) and the process (b)'s Raft server port.</p> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">httpServer</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">raft</span><span class="p">.</span><span class="nx">Raft</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">hs</span><span class="w"> </span><span class="nx">httpServer</span><span class="p">)</span><span class="w"> </span><span class="nx">addFollowerHandler</span><span class="p">(</span><span class="nx">w</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">ResponseWriter</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Request</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">followerId</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">URL</span><span class="p">.</span><span class="nx">Query</span><span class="p">().</span><span class="nx">Get</span><span class="p">(</span><span class="s">&quot;id&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">followerAddr</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">URL</span><span class="p">.</span><span class="nx">Query</span><span class="p">().</span><span class="nx">Get</span><span class="p">(</span><span class="s">&quot;addr&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">hs</span><span class="p">.</span><span class="nx">r</span><span class="p">.</span><span class="nx">State</span><span class="p">()</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">raft</span><span class="p">.</span><span class="nx">Leader</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">NewEncoder</span><span class="p">(</span><span class="nx">w</span><span class="p">).</span><span class="nx">Encode</span><span class="p">(</span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">Error</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="s">`json:&quot;error&quot;`</span> <span class="w"> </span><span class="p">}{</span> <span class="w"> </span><span class="s">&quot;Not the leader&quot;</span><span class="p">,</span> <span class="w"> </span><span class="p">})</span> <span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">Error</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusText</span><span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusBadRequest</span><span class="p">),</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusBadRequest</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">hs</span><span class="p">.</span><span class="nx">r</span><span class="p">.</span><span class="nx">AddVoter</span><span class="p">(</span><span class="nx">raft</span><span class="p">.</span><span class="nx">ServerID</span><span class="p">(</span><span class="nx">followerId</span><span class="p">),</span><span class="w"> </span><span class="nx">raft</span><span class="p">.</span><span class="nx">ServerAddress</span><span class="p">(</span><span class="nx">followerAddr</span><span class="p">),</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">).</span><span class="nx">Error</span><span class="p">()</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">&quot;Failed to add follower: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">Error</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusText</span><span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusBadRequest</span><span class="p">),</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusBadRequest</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">w</span><span class="p">.</span><span class="nx">WriteHeader</span><span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusOK</span><span class="p">)</span> <span class="p">}</span> </pre></div> <p>That's it! Let's move on to the query engine.</p> <h3 id="query-engine">Query engine</h3><p>The query engine is a wrapper around a storage layer. We'll bring in <a href="https://github.com/etcd-io/bbolt">bbolt</a>.</p> <p class="note"> I originally built this with <a href="https://github.com/cockroachdb/pebble">Cockroach's pebble</a> but pebble has a <a href="https://app.bountysource.com/issues/99017984-unable-to-build-xxhash-conflicts-with-other-package">transitive dependency on a C library that has function names that conflict with function names in the C library that pg_query_go wraps</a>. </p><div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">pgEngine</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">db</span><span class="w"> </span><span class="o">*</span><span class="nx">bolt</span><span class="p">.</span><span class="nx">DB</span> <span class="w"> </span><span class="nx">bucketName</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">newPgEngine</span><span class="p">(</span><span class="nx">db</span><span class="w"> </span><span class="o">*</span><span class="nx">bolt</span><span class="p">.</span><span class="nx">DB</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="nx">pgEngine</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">pgEngine</span><span class="p">{</span><span class="nx">db</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="nb">byte</span><span class="p">(</span><span class="s">&quot;data&quot;</span><span class="p">)}</span> <span class="p">}</span> </pre></div> <p class="note"> bbolt organizes data into buckets. Buckets might be a natural way to store table rows (one bucket per table) but to keep the implementation simple we'll put all table metadata and row data into a single `data` bucket. </p><p>The entrypoint we called in the Raft apply implementation above was <code>execute</code>. It took a parsed list of statements. We'll iterate over the statements, figuring out the kind of each statement, and call out to a dedicated helper for each kind.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">pe</span><span class="w"> </span><span class="o">*</span><span class="nx">pgEngine</span><span class="p">)</span><span class="w"> </span><span class="nx">execute</span><span class="p">(</span><span class="nx">tree</span><span class="w"> </span><span class="o">*</span><span class="nx">pgquery</span><span class="p">.</span><span class="nx">ParseResult</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">stmt</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">tree</span><span class="p">.</span><span class="nx">GetStmts</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">n</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">stmt</span><span class="p">.</span><span class="nx">GetStmt</span><span class="p">()</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">n</span><span class="p">.</span><span class="nx">GetCreateStmt</span><span class="p">();</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">pe</span><span class="p">.</span><span class="nx">executeCreate</span><span class="p">(</span><span class="nx">c</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">n</span><span class="p">.</span><span class="nx">GetInsertStmt</span><span class="p">();</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">pe</span><span class="p">.</span><span class="nx">executeInsert</span><span class="p">(</span><span class="nx">c</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">n</span><span class="p">.</span><span class="nx">GetSelectStmt</span><span class="p">();</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">pe</span><span class="p">.</span><span class="nx">executeSelect</span><span class="p">(</span><span class="nx">c</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Unknown statement type: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">stmt</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <p class="note"> The pg_query_go docs are not super helpful. I had to build a <a href="https://github.com/eatonphil/waterbugdb/blob/main/astexplorer/main.go">separate AST explorer program</a> to make it easier to understand this parser. </p><p>Let's start with creating a table.</p> <h3 id="create-table">Create table</h3><p>When a table is created, we'll need to store its metadata.</p> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">tableDefinition</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">Name</span><span class="w"> </span><span class="kt">string</span> <span class="w"> </span><span class="nx">ColumnNames</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span> <span class="w"> </span><span class="nx">ColumnTypes</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span> <span class="p">}</span> </pre></div> <p>First we pull that metadata out of the AST.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">pe</span><span class="w"> </span><span class="o">*</span><span class="nx">pgEngine</span><span class="p">)</span><span class="w"> </span><span class="nx">executeCreate</span><span class="p">(</span><span class="nx">stmt</span><span class="w"> </span><span class="o">*</span><span class="nx">pgquery</span><span class="p">.</span><span class="nx">CreateStmt</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">tbl</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">tableDefinition</span><span class="p">{}</span> <span class="w"> </span><span class="nx">tbl</span><span class="p">.</span><span class="nx">Name</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">stmt</span><span class="p">.</span><span class="nx">Relation</span><span class="p">.</span><span class="nx">Relname</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">stmt</span><span class="p">.</span><span class="nx">TableElts</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">cd</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">GetColumnDef</span><span class="p">()</span> <span class="w"> </span><span class="nx">tbl</span><span class="p">.</span><span class="nx">ColumnNames</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">tbl</span><span class="p">.</span><span class="nx">ColumnNames</span><span class="p">,</span><span class="w"> </span><span class="nx">cd</span><span class="p">.</span><span class="nx">Colname</span><span class="p">)</span> <span class="w"> </span><span class="c1">// Names is namespaced. So `INT` is pg_catalog.int4. `BIGINT` is pg_catalog.int8.</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">columnType</span><span class="w"> </span><span class="kt">string</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">n</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">cd</span><span class="p">.</span><span class="nx">TypeName</span><span class="p">.</span><span class="nx">Names</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">columnType</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">columnType</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="s">&quot;.&quot;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">columnType</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="nx">n</span><span class="p">.</span><span class="nx">GetString_</span><span class="p">().</span><span class="nx">Str</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">tbl</span><span class="p">.</span><span class="nx">ColumnTypes</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">tbl</span><span class="p">.</span><span class="nx">ColumnTypes</span><span class="p">,</span><span class="w"> </span><span class="nx">columnType</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Now we need to store this in the storage layer. The easiest/dumbest way to do this is to serialize the metadata to JSON and store it with key: <code>tables_${tableName}</code>.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">tableBytes</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">Marshal</span><span class="p">(</span><span class="nx">tbl</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Could not marshal table: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">pe</span><span class="p">.</span><span class="nx">db</span><span class="p">.</span><span class="nx">Update</span><span class="p">(</span><span class="kd">func</span><span class="p">(</span><span class="nx">tx</span><span class="w"> </span><span class="o">*</span><span class="nx">bolt</span><span class="p">.</span><span class="nx">Tx</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">bkt</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">tx</span><span class="p">.</span><span class="nx">CreateBucketIfNotExists</span><span class="p">(</span><span class="nx">pe</span><span class="p">.</span><span class="nx">bucketName</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">bkt</span><span class="p">.</span><span class="nx">Put</span><span class="p">([]</span><span class="nb">byte</span><span class="p">(</span><span class="s">&quot;tables_&quot;</span><span class="o">+</span><span class="nx">tbl</span><span class="p">.</span><span class="nx">Name</span><span class="p">),</span><span class="w"> </span><span class="nx">tableBytes</span><span class="p">)</span> <span class="w"> </span><span class="p">})</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Could not set key-value: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <p>Next we'll build a helper to reverse that operation, pulling out table metadata from the storage layer by the table name:</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">pe</span><span class="w"> </span><span class="o">*</span><span class="nx">pgEngine</span><span class="p">)</span><span class="w"> </span><span class="nx">getTableDefinition</span><span class="p">(</span><span class="nx">name</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">tableDefinition</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">tbl</span><span class="w"> </span><span class="nx">tableDefinition</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">pe</span><span class="p">.</span><span class="nx">db</span><span class="p">.</span><span class="nx">View</span><span class="p">(</span><span class="kd">func</span><span class="p">(</span><span class="nx">tx</span><span class="w"> </span><span class="o">*</span><span class="nx">bolt</span><span class="p">.</span><span class="nx">Tx</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">bkt</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">tx</span><span class="p">.</span><span class="nx">Bucket</span><span class="p">(</span><span class="nx">pe</span><span class="p">.</span><span class="nx">bucketName</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">bkt</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Table does not exist&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">valBytes</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">bkt</span><span class="p">.</span><span class="nx">Get</span><span class="p">([]</span><span class="nb">byte</span><span class="p">(</span><span class="s">&quot;tables_&quot;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">name</span><span class="p">))</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">Unmarshal</span><span class="p">(</span><span class="nx">valBytes</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">tbl</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Could not unmarshal table: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="p">})</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">tbl</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="p">}</span> </pre></div> <p>That's it for our basic <code>CREATE TABLE</code> support! Let's do <code>INSERT</code> next.</p> <h3 id="insert-row">Insert row</h3><p>Our support for insert will only support literal/constant <code>VALUES</code>.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">pe</span><span class="w"> </span><span class="o">*</span><span class="nx">pgEngine</span><span class="p">)</span><span class="w"> </span><span class="nx">executeInsert</span><span class="p">(</span><span class="nx">stmt</span><span class="w"> </span><span class="o">*</span><span class="nx">pgquery</span><span class="p">.</span><span class="nx">InsertStmt</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">tblName</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">stmt</span><span class="p">.</span><span class="nx">Relation</span><span class="p">.</span><span class="nx">Relname</span> <span class="w"> </span><span class="nx">slct</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">stmt</span><span class="p">.</span><span class="nx">GetSelectStmt</span><span class="p">().</span><span class="nx">GetSelectStmt</span><span class="p">()</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">values</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">slct</span><span class="p">.</span><span class="nx">ValuesLists</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">rowData</span><span class="w"> </span><span class="p">[]</span><span class="kt">any</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">values</span><span class="p">.</span><span class="nx">GetList</span><span class="p">().</span><span class="nx">Items</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">value</span><span class="p">.</span><span class="nx">GetAConst</span><span class="p">();</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">Val</span><span class="p">.</span><span class="nx">GetString_</span><span class="p">();</span><span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">rowData</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">rowData</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">Str</span><span class="p">)</span> <span class="w"> </span><span class="k">continue</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">Val</span><span class="p">.</span><span class="nx">GetInteger</span><span class="p">();</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">rowData</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">rowData</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">.</span><span class="nx">Ival</span><span class="p">)</span> <span class="w"> </span><span class="k">continue</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Unknown value type: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>It would be better to abstract this <code>VALUES</code> code into a helper so it could be used by <code>SELECT</code>s too but out of laziness we'll just keep this here.</p> <p>Next we need to write the row to the storage layer. We'll serialize the row data to JSON (inefficient because we know the row structure, but JSON is easy). We'll store the row with a prefix including the table name and we'll give its key a unique UUID. When we're iterating over rows in the table we'll be able to do a prefix scan that will recover just the rows in this table.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">rowBytes</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">Marshal</span><span class="p">(</span><span class="nx">rowData</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Could not marshal row: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">id</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">uuid</span><span class="p">.</span><span class="nx">New</span><span class="p">().</span><span class="nx">String</span><span class="p">()</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">pe</span><span class="p">.</span><span class="nx">db</span><span class="p">.</span><span class="nx">Update</span><span class="p">(</span><span class="kd">func</span><span class="p">(</span><span class="nx">tx</span><span class="w"> </span><span class="o">*</span><span class="nx">bolt</span><span class="p">.</span><span class="nx">Tx</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">bkt</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">tx</span><span class="p">.</span><span class="nx">CreateBucketIfNotExists</span><span class="p">(</span><span class="nx">pe</span><span class="p">.</span><span class="nx">bucketName</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">bkt</span><span class="p">.</span><span class="nx">Put</span><span class="p">([]</span><span class="nb">byte</span><span class="p">(</span><span class="s">&quot;rows_&quot;</span><span class="o">+</span><span class="nx">tblName</span><span class="o">+</span><span class="s">&quot;_&quot;</span><span class="o">+</span><span class="nx">id</span><span class="p">),</span><span class="w"> </span><span class="nx">rowBytes</span><span class="p">)</span> <span class="w"> </span><span class="p">})</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Could not store row: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <p>Finally we can move on to support <code>SELECT</code>!</p> <h3 id="select-rows">Select rows</h3><p>Unlike <code>CREATE TABLE</code> and <code>INSERT</code>, <code>SELECT</code> will need to return rows, column names, and because the Postgres wire protocol wants it, column types.</p> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">pgResult</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">fieldNames</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span> <span class="w"> </span><span class="nx">fieldTypes</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span> <span class="w"> </span><span class="nx">rows</span><span class="w"> </span><span class="p">[][]</span><span class="kt">any</span> <span class="p">}</span> </pre></div> <p>First we pull out the table name and the fields selected, looking up field types in the table metadata.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">pe</span><span class="w"> </span><span class="o">*</span><span class="nx">pgEngine</span><span class="p">)</span><span class="w"> </span><span class="nx">executeSelect</span><span class="p">(</span><span class="nx">stmt</span><span class="w"> </span><span class="o">*</span><span class="nx">pgquery</span><span class="p">.</span><span class="nx">SelectStmt</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">pgResult</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">tblName</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">stmt</span><span class="p">.</span><span class="nx">FromClause</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="nx">GetRangeVar</span><span class="p">().</span><span class="nx">Relname</span> <span class="w"> </span><span class="nx">tbl</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">pe</span><span class="p">.</span><span class="nx">getTableDefinition</span><span class="p">(</span><span class="nx">tblName</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">results</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">pgResult</span><span class="p">{}</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">stmt</span><span class="p">.</span><span class="nx">TargetList</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">fieldName</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">GetResTarget</span><span class="p">().</span><span class="nx">Val</span><span class="p">.</span><span class="nx">GetColumnRef</span><span class="p">().</span><span class="nx">Fields</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="nx">GetString_</span><span class="p">().</span><span class="nx">Str</span> <span class="w"> </span><span class="nx">results</span><span class="p">.</span><span class="nx">fieldNames</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">results</span><span class="p">.</span><span class="nx">fieldNames</span><span class="p">,</span><span class="w"> </span><span class="nx">fieldName</span><span class="p">)</span> <span class="w"> </span><span class="nx">fieldType</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="s">&quot;&quot;</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">cn</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">tbl</span><span class="p">.</span><span class="nx">ColumnNames</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">cn</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">fieldName</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">fieldType</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">tbl</span><span class="p">.</span><span class="nx">ColumnTypes</span><span class="p">[</span><span class="nx">i</span><span class="p">]</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">fieldType</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Unknown field: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">fieldName</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">results</span><span class="p">.</span><span class="nx">fieldTypes</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">results</span><span class="p">.</span><span class="nx">fieldTypes</span><span class="p">,</span><span class="w"> </span><span class="nx">fieldType</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Finally, we do a prefix scan to grab all rows in the table from the storage layer.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">prefix</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="nb">byte</span><span class="p">(</span><span class="s">&quot;rows_&quot;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">tblName</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s">&quot;_&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">pe</span><span class="p">.</span><span class="nx">db</span><span class="p">.</span><span class="nx">View</span><span class="p">(</span><span class="kd">func</span><span class="p">(</span><span class="nx">tx</span><span class="w"> </span><span class="o">*</span><span class="nx">bolt</span><span class="p">.</span><span class="nx">Tx</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">tx</span><span class="p">.</span><span class="nx">Bucket</span><span class="p">(</span><span class="nx">pe</span><span class="p">.</span><span class="nx">bucketName</span><span class="p">).</span><span class="nx">Cursor</span><span class="p">()</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">k</span><span class="p">,</span><span class="w"> </span><span class="nx">v</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">Seek</span><span class="p">(</span><span class="nx">prefix</span><span class="p">);</span><span class="w"> </span><span class="nx">k</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">bytes</span><span class="p">.</span><span class="nx">HasPrefix</span><span class="p">(</span><span class="nx">k</span><span class="p">,</span><span class="w"> </span><span class="nx">prefix</span><span class="p">);</span><span class="w"> </span><span class="nx">k</span><span class="p">,</span><span class="w"> </span><span class="nx">v</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">Next</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">row</span><span class="w"> </span><span class="p">[]</span><span class="kt">any</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">Unmarshal</span><span class="p">(</span><span class="nx">v</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">row</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Unable to unmarshal row: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">targetRow</span><span class="w"> </span><span class="p">[]</span><span class="kt">any</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">target</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">results</span><span class="p">.</span><span class="nx">fieldNames</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">field</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">tbl</span><span class="p">.</span><span class="nx">ColumnNames</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">target</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">field</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">targetRow</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">targetRow</span><span class="p">,</span><span class="w"> </span><span class="nx">row</span><span class="p">[</span><span class="nx">i</span><span class="p">])</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">results</span><span class="p">.</span><span class="nx">rows</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">results</span><span class="p">.</span><span class="nx">rows</span><span class="p">,</span><span class="w"> </span><span class="nx">targetRow</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="p">})</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">results</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <p>That's it for <code>SELECT</code>! The last function we'll implement is a helper for deleting all data in the storage layer. This will be called on startup before Raft logs are applied so the database always ends up in a consistent state.</p> <div class="highlight"><pre><span></span><span class="k">func</span><span class="w"> </span><span class="p">(</span><span class="n">pe</span><span class="w"> </span><span class="o">*</span><span class="n">pgEngine</span><span class="p">)</span><span class="w"> </span><span class="n">delete</span><span class="p">()</span><span class="w"> </span><span class="n">error</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">pe</span><span class="o">.</span><span class="n">db</span><span class="o">.</span><span class="n">Update</span><span class="p">(</span><span class="k">func</span><span class="p">(</span><span class="n">tx</span><span class="w"> </span><span class="o">*</span><span class="n">bolt</span><span class="o">.</span><span class="n">Tx</span><span class="p">)</span><span class="w"> </span><span class="n">error</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">bkt</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">tx</span><span class="o">.</span><span class="n">Bucket</span><span class="p">(</span><span class="n">pe</span><span class="o">.</span><span class="n">bucketName</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">bkt</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">tx</span><span class="o">.</span><span class="n">DeleteBucket</span><span class="p">(</span><span class="n">pe</span><span class="o">.</span><span class="n">bucketName</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">nil</span> <span class="w"> </span><span class="p">})</span> <span class="p">}</span> </pre></div> <p>And we're ready to move on to the final layer, the Postgres wire protocol.</p> <h3 id="postgres-wire-protocol-server">Postgres wire protocol server</h3><p><a href="https://github.com/jackc/pgproto3">jackc/pgproto3</a> is an implementation of the Postgres wire protocol for Go. It allows us to implement a server that can respond to requests by Postgres clients like <code>psql</code>.</p> <p>It works by wrapping a TCP connection. So we'll start by building a function that does the TCP serving loop.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">runPgServer</span><span class="p">(</span><span class="nx">port</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">db</span><span class="w"> </span><span class="o">*</span><span class="nx">bolt</span><span class="p">.</span><span class="nx">DB</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">raft</span><span class="p">.</span><span class="nx">Raft</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">ln</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">net</span><span class="p">.</span><span class="nx">Listen</span><span class="p">(</span><span class="s">&quot;tcp&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;localhost:&quot;</span><span class="o">+</span><span class="nx">port</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">conn</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">ln</span><span class="p">.</span><span class="nx">Accept</span><span class="p">()</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">pc</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">pgConn</span><span class="p">{</span><span class="nx">conn</span><span class="p">,</span><span class="w"> </span><span class="nx">db</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="p">}</span> <span class="w"> </span><span class="k">go</span><span class="w"> </span><span class="nx">pc</span><span class="p">.</span><span class="nx">handle</span><span class="p">()</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>The <code>pgConn</code> instance needs access to the database directly so it can respond to <code>SELECT</code>s. And it needs the Raft instance for all other queries.</p> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">pgConn</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">conn</span><span class="w"> </span><span class="nx">net</span><span class="p">.</span><span class="nx">Conn</span> <span class="w"> </span><span class="nx">db</span><span class="w"> </span><span class="o">*</span><span class="nx">bolt</span><span class="p">.</span><span class="nx">DB</span> <span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">raft</span><span class="p">.</span><span class="nx">Raft</span> <span class="p">}</span> </pre></div> <p>The <code>handle</code> function we called above will grab the current message via the pgproto3 package and handle startup messages and regular messages.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">pc</span><span class="w"> </span><span class="nx">pgConn</span><span class="p">)</span><span class="w"> </span><span class="nx">handle</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">pgc</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">pgproto3</span><span class="p">.</span><span class="nx">NewBackend</span><span class="p">(</span><span class="nx">pgproto3</span><span class="p">.</span><span class="nx">NewChunkReader</span><span class="p">(</span><span class="nx">pc</span><span class="p">.</span><span class="nx">conn</span><span class="p">),</span><span class="w"> </span><span class="nx">pc</span><span class="p">.</span><span class="nx">conn</span><span class="p">)</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">pc</span><span class="p">.</span><span class="nx">conn</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">pc</span><span class="p">.</span><span class="nx">handleStartupMessage</span><span class="p">(</span><span class="nx">pgc</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">pc</span><span class="p">.</span><span class="nx">handleMessage</span><span class="p">(</span><span class="nx">pgc</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>Startup messages include authorization and SSL checks. We'll allow anything in the former and respond "no" to the latter.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">pc</span><span class="w"> </span><span class="nx">pgConn</span><span class="p">)</span><span class="w"> </span><span class="nx">handleStartupMessage</span><span class="p">(</span><span class="nx">pgconn</span><span class="w"> </span><span class="o">*</span><span class="nx">pgproto3</span><span class="p">.</span><span class="nx">Backend</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">startupMessage</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">pgconn</span><span class="p">.</span><span class="nx">ReceiveStartupMessage</span><span class="p">()</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Error receiving startup message: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">startupMessage</span><span class="p">.(</span><span class="kd">type</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="o">*</span><span class="nx">pgproto3</span><span class="p">.</span><span class="nx">StartupMessage</span><span class="p">:</span> <span class="w"> </span><span class="nx">buf</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">(</span><span class="o">&amp;</span><span class="nx">pgproto3</span><span class="p">.</span><span class="nx">AuthenticationOk</span><span class="p">{}).</span><span class="nx">Encode</span><span class="p">(</span><span class="kc">nil</span><span class="p">)</span> <span class="w"> </span><span class="nx">buf</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="p">(</span><span class="o">&amp;</span><span class="nx">pgproto3</span><span class="p">.</span><span class="nx">ReadyForQuery</span><span class="p">{</span><span class="nx">TxStatus</span><span class="p">:</span><span class="w"> </span><span class="sc">&#39;I&#39;</span><span class="p">}).</span><span class="nx">Encode</span><span class="p">(</span><span class="nx">buf</span><span class="p">)</span> <span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">pc</span><span class="p">.</span><span class="nx">conn</span><span class="p">.</span><span class="nx">Write</span><span class="p">(</span><span class="nx">buf</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Error sending ready for query: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="o">*</span><span class="nx">pgproto3</span><span class="p">.</span><span class="nx">SSLRequest</span><span class="p">:</span> <span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">pc</span><span class="p">.</span><span class="nx">conn</span><span class="p">.</span><span class="nx">Write</span><span class="p">([]</span><span class="nb">byte</span><span class="p">(</span><span class="s">&quot;N&quot;</span><span class="p">))</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Error sending deny SSL request: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">pc</span><span class="p">.</span><span class="nx">handleStartupMessage</span><span class="p">(</span><span class="nx">pgconn</span><span class="p">)</span> <span class="w"> </span><span class="k">default</span><span class="p">:</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Unknown startup message: %#v&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">startupMessage</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>Within the main <code>handleMessage</code> logic we'll check the type of message.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">pc</span><span class="w"> </span><span class="nx">pgConn</span><span class="p">)</span><span class="w"> </span><span class="nx">handleMessage</span><span class="p">(</span><span class="nx">pgc</span><span class="w"> </span><span class="o">*</span><span class="nx">pgproto3</span><span class="p">.</span><span class="nx">Backend</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">msg</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">pgc</span><span class="p">.</span><span class="nx">Receive</span><span class="p">()</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Error receiving message: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">t</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">msg</span><span class="p">.(</span><span class="kd">type</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="o">*</span><span class="nx">pgproto3</span><span class="p">.</span><span class="nx">Query</span><span class="p">:</span> <span class="w"> </span><span class="c1">// TODO</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="o">*</span><span class="nx">pgproto3</span><span class="p">.</span><span class="nx">Terminate</span><span class="p">:</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="k">default</span><span class="p">:</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Received message other than Query from client: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">msg</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <p>If the message is a query we'll parse it and respond immediately to <code>SELECT</code>s.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">t</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">msg</span><span class="p">.(</span><span class="kd">type</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="o">*</span><span class="nx">pgproto3</span><span class="p">.</span><span class="nx">Query</span><span class="p">:</span> <span class="w"> </span><span class="nx">stmts</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">pgquery</span><span class="p">.</span><span class="nx">Parse</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">String</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Error parsing query: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">stmts</span><span class="p">.</span><span class="nx">GetStmts</span><span class="p">())</span><span class="w"> </span><span class="p">&gt;</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Only make one request at a time.&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">stmt</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">stmts</span><span class="p">.</span><span class="nx">GetStmts</span><span class="p">()[</span><span class="mi">0</span><span class="p">]</span> <span class="w"> </span><span class="c1">// Handle SELECTs here</span> <span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">stmt</span><span class="p">.</span><span class="nx">GetStmt</span><span class="p">().</span><span class="nx">GetSelectStmt</span><span class="p">()</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">pe</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">newPgEngine</span><span class="p">(</span><span class="nx">pc</span><span class="p">.</span><span class="nx">db</span><span class="p">)</span> <span class="w"> </span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">pe</span><span class="p">.</span><span class="nx">executeSelect</span><span class="p">(</span><span class="nx">s</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">pc</span><span class="p">.</span><span class="nx">writePgResult</span><span class="p">(</span><span class="nx">res</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>(We'll implement that <code>writePgResult</code> helper shortly below.) Otherwise we'll add the query to the Raft log and return a basic response.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="c1">// Otherwise it&#39;s DDL/DML, raftify</span> <span class="w"> </span><span class="nx">future</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">pc</span><span class="p">.</span><span class="nx">r</span><span class="p">.</span><span class="nx">Apply</span><span class="p">([]</span><span class="nb">byte</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">String</span><span class="p">),</span><span class="w"> </span><span class="mi">500</span><span class="o">*</span><span class="nx">time</span><span class="p">.</span><span class="nx">Millisecond</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">future</span><span class="p">.</span><span class="nx">Error</span><span class="p">();</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Could not apply: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">e</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">future</span><span class="p">.</span><span class="nx">Response</span><span class="p">()</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">e</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Could not apply (internal): %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">e</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">pc</span><span class="p">.</span><span class="nx">done</span><span class="p">(</span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">ToUpper</span><span class="p">(</span><span class="nx">strings</span><span class="p">.</span><span class="nx">Split</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">String</span><span class="p">,</span><span class="w"> </span><span class="s">&quot; &quot;</span><span class="p">)[</span><span class="mi">0</span><span class="p">])</span><span class="o">+</span><span class="s">&quot; ok&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="o">*</span><span class="nx">pgproto3</span><span class="p">.</span><span class="nx">Terminate</span><span class="p">:</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="k">default</span><span class="p">:</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Received message other than Query from client: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">msg</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <p><code>done</code> is an important helper that tells the Postgres connection that the query is complete and the server is ready to receive another query. Without this response <code>psql</code> just hangs.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">pc</span><span class="w"> </span><span class="nx">pgConn</span><span class="p">)</span><span class="w"> </span><span class="nx">done</span><span class="p">(</span><span class="nx">buf</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="nx">msg</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">buf</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="p">(</span><span class="o">&amp;</span><span class="nx">pgproto3</span><span class="p">.</span><span class="nx">CommandComplete</span><span class="p">{</span><span class="nx">CommandTag</span><span class="p">:</span><span class="w"> </span><span class="p">[]</span><span class="nb">byte</span><span class="p">(</span><span class="nx">msg</span><span class="p">)}).</span><span class="nx">Encode</span><span class="p">(</span><span class="nx">buf</span><span class="p">)</span> <span class="w"> </span><span class="nx">buf</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="p">(</span><span class="o">&amp;</span><span class="nx">pgproto3</span><span class="p">.</span><span class="nx">ReadyForQuery</span><span class="p">{</span><span class="nx">TxStatus</span><span class="p">:</span><span class="w"> </span><span class="sc">&#39;I&#39;</span><span class="p">}).</span><span class="nx">Encode</span><span class="p">(</span><span class="nx">buf</span><span class="p">)</span> <span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">pc</span><span class="p">.</span><span class="nx">conn</span><span class="p">.</span><span class="nx">Write</span><span class="p">(</span><span class="nx">buf</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">&quot;Failed to write query response: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>And now let's implement the <code>writePgResult</code> helper. This function needs to translate from our <code>pgResult</code> struct to the format require by pgproto3.</p> <div class="highlight"><pre><span></span><span class="kd">var</span><span class="w"> </span><span class="nx">dataTypeOIDMap</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">uint32</span><span class="p">{</span> <span class="w"> </span><span class="s">&quot;text&quot;</span><span class="p">:</span><span class="w"> </span><span class="mi">25</span><span class="p">,</span> <span class="w"> </span><span class="s">&quot;pg_catalog.int4&quot;</span><span class="p">:</span><span class="w"> </span><span class="mi">23</span><span class="p">,</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">pc</span><span class="w"> </span><span class="nx">pgConn</span><span class="p">)</span><span class="w"> </span><span class="nx">writePgResult</span><span class="p">(</span><span class="nx">res</span><span class="w"> </span><span class="o">*</span><span class="nx">pgResult</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">rd</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">pgproto3</span><span class="p">.</span><span class="nx">RowDescription</span><span class="p">{}</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">field</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">res</span><span class="p">.</span><span class="nx">fieldNames</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">rd</span><span class="p">.</span><span class="nx">Fields</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">rd</span><span class="p">.</span><span class="nx">Fields</span><span class="p">,</span><span class="w"> </span><span class="nx">pgproto3</span><span class="p">.</span><span class="nx">FieldDescription</span><span class="p">{</span> <span class="w"> </span><span class="nx">Name</span><span class="p">:</span><span class="w"> </span><span class="p">[]</span><span class="nb">byte</span><span class="p">(</span><span class="nx">field</span><span class="p">),</span> <span class="w"> </span><span class="nx">DataTypeOID</span><span class="p">:</span><span class="w"> </span><span class="nx">dataTypeOIDMap</span><span class="p">[</span><span class="nx">res</span><span class="p">.</span><span class="nx">fieldTypes</span><span class="p">[</span><span class="nx">i</span><span class="p">]],</span> <span class="w"> </span><span class="p">})</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">buf</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">rd</span><span class="p">.</span><span class="nx">Encode</span><span class="p">(</span><span class="kc">nil</span><span class="p">)</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">row</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">res</span><span class="p">.</span><span class="nx">rows</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">dr</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">pgproto3</span><span class="p">.</span><span class="nx">DataRow</span><span class="p">{}</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">row</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">bs</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">Marshal</span><span class="p">(</span><span class="nx">value</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">&quot;Failed to marshal cell: %s\n&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">dr</span><span class="p">.</span><span class="nx">Values</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">dr</span><span class="p">.</span><span class="nx">Values</span><span class="p">,</span><span class="w"> </span><span class="nx">bs</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">buf</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">dr</span><span class="p">.</span><span class="nx">Encode</span><span class="p">(</span><span class="nx">buf</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">pc</span><span class="p">.</span><span class="nx">done</span><span class="p">(</span><span class="nx">buf</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">&quot;SELECT %d&quot;</span><span class="p">,</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">res</span><span class="p">.</span><span class="nx">rows</span><span class="p">)))</span> <span class="p">}</span> </pre></div> <p>And we're done with everything but <code>func main()</code>!</p> <h3 id="main">Main</h3><p>On startup, each process must be assigned (by the parent process) a unique node id (any unique string is ok) and ports for the Raft server, Postgres server, and HTTP server. We'll build a short <code>getConfig</code> helper to grab these from arguments.</p> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">config</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">id</span><span class="w"> </span><span class="kt">string</span> <span class="w"> </span><span class="nx">httpPort</span><span class="w"> </span><span class="kt">string</span> <span class="w"> </span><span class="nx">raftPort</span><span class="w"> </span><span class="kt">string</span> <span class="w"> </span><span class="nx">pgPort</span><span class="w"> </span><span class="kt">string</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">getConfig</span><span class="p">()</span><span class="w"> </span><span class="nx">config</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">cfg</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">config</span><span class="p">{}</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">arg</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">arg</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;--node-id&quot;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">cfg</span><span class="p">.</span><span class="nx">id</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="nx">i</span><span class="o">+</span><span class="mi">2</span><span class="p">]</span> <span class="w"> </span><span class="nx">i</span><span class="o">++</span> <span class="w"> </span><span class="k">continue</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">arg</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;--http-port&quot;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">cfg</span><span class="p">.</span><span class="nx">httpPort</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="nx">i</span><span class="o">+</span><span class="mi">2</span><span class="p">]</span> <span class="w"> </span><span class="nx">i</span><span class="o">++</span> <span class="w"> </span><span class="k">continue</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">arg</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;--raft-port&quot;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">cfg</span><span class="p">.</span><span class="nx">raftPort</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="nx">i</span><span class="o">+</span><span class="mi">2</span><span class="p">]</span> <span class="w"> </span><span class="nx">i</span><span class="o">++</span> <span class="w"> </span><span class="k">continue</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">arg</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;--pg-port&quot;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">cfg</span><span class="p">.</span><span class="nx">pgPort</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="nx">i</span><span class="o">+</span><span class="mi">2</span><span class="p">]</span> <span class="w"> </span><span class="nx">i</span><span class="o">++</span> <span class="w"> </span><span class="k">continue</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">cfg</span><span class="p">.</span><span class="nx">id</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="s">&quot;Missing required parameter: --node-id&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">cfg</span><span class="p">.</span><span class="nx">raftPort</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="s">&quot;Missing required parameter: --raft-port&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">cfg</span><span class="p">.</span><span class="nx">httpPort</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="s">&quot;Missing required parameter: --http-port&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">cfg</span><span class="p">.</span><span class="nx">pgPort</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="s">&quot;Missing required parameter: --pg-port&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">cfg</span> <span class="p">}</span> </pre></div> <p>Now in <code>main</code> we'll grab the config and set up this process's database. All processes will put their data in a top-level <code>data</code> directory to make managing the directories easier. But within that directory each process will have their own unique directories for data storage based on the unique node id.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">cfg</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">getConfig</span><span class="p">()</span> <span class="w"> </span><span class="nx">dataDir</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="s">&quot;data&quot;</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">MkdirAll</span><span class="p">(</span><span class="nx">dataDir</span><span class="p">,</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">ModePerm</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatalf</span><span class="p">(</span><span class="s">&quot;Could not create data directory: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">db</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">bolt</span><span class="p">.</span><span class="nx">Open</span><span class="p">(</span><span class="nx">path</span><span class="p">.</span><span class="nx">Join</span><span class="p">(</span><span class="nx">dataDir</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;/data&quot;</span><span class="o">+</span><span class="nx">cfg</span><span class="p">.</span><span class="nx">id</span><span class="p">),</span><span class="w"> </span><span class="mo">0600</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatalf</span><span class="p">(</span><span class="s">&quot;Could not open bolt db: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">db</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span> </pre></div> <p>We need to clean up the database.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">pe</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">newPgEngine</span><span class="p">(</span><span class="nx">db</span><span class="p">)</span> <span class="w"> </span><span class="c1">// Start off in clean state</span> <span class="w"> </span><span class="nx">pe</span><span class="p">.</span><span class="nb">delete</span><span class="p">()</span> </pre></div> <p>Set up the Raft server.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">pf</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">pgFsm</span><span class="p">{</span><span class="nx">pe</span><span class="p">}</span> <span class="w"> </span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">setupRaft</span><span class="p">(</span><span class="nx">path</span><span class="p">.</span><span class="nx">Join</span><span class="p">(</span><span class="nx">dataDir</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;raft&quot;</span><span class="o">+</span><span class="nx">cfg</span><span class="p">.</span><span class="nx">id</span><span class="p">),</span><span class="w"> </span><span class="nx">cfg</span><span class="p">.</span><span class="nx">id</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;localhost:&quot;</span><span class="o">+</span><span class="nx">cfg</span><span class="p">.</span><span class="nx">raftPort</span><span class="p">,</span><span class="w"> </span><span class="nx">pf</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Set up the HTTP server.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">hs</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">httpServer</span><span class="p">{</span><span class="nx">r</span><span class="p">}</span> <span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">HandleFunc</span><span class="p">(</span><span class="s">&quot;/add-follower&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">hs</span><span class="p">.</span><span class="nx">addFollowerHandler</span><span class="p">)</span> <span class="w"> </span><span class="k">go</span><span class="w"> </span><span class="kd">func</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">ListenAndServe</span><span class="p">(</span><span class="s">&quot;:&quot;</span><span class="o">+</span><span class="nx">cfg</span><span class="p">.</span><span class="nx">httpPort</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}()</span> </pre></div> <p>And finally, kick off the Postgres server.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">runPgServer</span><span class="p">(</span><span class="nx">cfg</span><span class="p">.</span><span class="nx">pgPort</span><span class="p">,</span><span class="w"> </span><span class="nx">db</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="p">)</span> <span class="p">}</span> </pre></div> <p>Finally. Finally. Finally done. Let's give it a go. :)</p> <h3 id="what-hath-god-wrought">What hath god wrought</h3><p>First, initialize the go module and then build the app.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>mod<span class="w"> </span>init<span class="w"> </span>waterbugdb $<span class="w"> </span>go<span class="w"> </span>mod<span class="w"> </span>tidy $<span class="w"> </span>go<span class="w"> </span>build </pre></div> <p>Now in terminal 1 start an instance of the database,</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>./waterbugdb<span class="w"> </span>--node-id<span class="w"> </span>node1<span class="w"> </span>--raft-port<span class="w"> </span><span class="m">2222</span><span class="w"> </span>--http-port<span class="w"> </span><span class="m">8222</span><span class="w"> </span>--pg-port<span class="w"> </span><span class="m">6000</span> </pre></div> <p>Then in terminal 2 start another instance.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>./waterbugdb<span class="w"> </span>--node-id<span class="w"> </span>node2<span class="w"> </span>--raft-port<span class="w"> </span><span class="m">2223</span><span class="w"> </span>--http-port<span class="w"> </span><span class="m">8223</span><span class="w"> </span>--pg-port<span class="w"> </span><span class="m">6001</span> </pre></div> <p>And in terminal 3, tell <code>node1</code> to have <code>node2</code> follow it.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>curl<span class="w"> </span><span class="s1">&#39;localhost:8222/add-follower?addr=localhost:2223&amp;id=node2&#39;</span> </pre></div> <p>And then open <code>psql</code> against port <code>6000</code>, the leader.</p> <div class="highlight"><pre><span></span><span class="err">$</span><span class="w"> </span><span class="n">psql</span><span class="w"> </span><span class="o">-</span><span class="n">h</span><span class="w"> </span><span class="n">localhost</span><span class="w"> </span><span class="o">-</span><span class="n">p</span><span class="w"> </span><span class="mi">6000</span> <span class="n">psql</span><span class="w"> </span><span class="o">-</span><span class="n">h</span><span class="w"> </span><span class="mi">127</span><span class="p">.</span><span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">.</span><span class="mi">1</span><span class="w"> </span><span class="o">-</span><span class="n">p</span><span class="w"> </span><span class="mi">6000</span> <span class="n">psql</span><span class="w"> </span><span class="p">(</span><span class="mi">13</span><span class="p">.</span><span class="mi">4</span><span class="p">,</span><span class="w"> </span><span class="n">server</span><span class="w"> </span><span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">)</span> <span class="k">Type</span><span class="w"> </span><span class="ss">&quot;help&quot;</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">help</span><span class="p">.</span> <span class="n">phil</span><span class="o">=&gt;</span><span class="w"> </span><span class="k">create</span><span class="w"> </span><span class="k">table</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="p">(</span><span class="n">age</span><span class="w"> </span><span class="nb">int</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="nb">text</span><span class="p">);</span> <span class="k">CREATE</span><span class="w"> </span><span class="n">ok</span> <span class="n">phil</span><span class="o">=&gt;</span><span class="w"> </span><span class="k">insert</span><span class="w"> </span><span class="k">into</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="k">values</span><span class="p">(</span><span class="mi">14</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;garry&#39;</span><span class="p">),</span><span class="w"> </span><span class="p">(</span><span class="mi">20</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;ted&#39;</span><span class="p">);</span> <span class="n">could</span><span class="w"> </span><span class="k">not</span><span class="w"> </span><span class="n">interpret</span><span class="w"> </span><span class="k">result</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="n">server</span><span class="p">:</span><span class="w"> </span><span class="k">INSERT</span><span class="w"> </span><span class="n">ok</span> <span class="k">INSERT</span><span class="w"> </span><span class="n">ok</span> <span class="n">phil</span><span class="o">=&gt;</span><span class="w"> </span><span class="k">select</span><span class="w"> </span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="n">age</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="n">x</span><span class="p">;</span> <span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">age</span><span class="w"> </span> <span class="c1">---------+-----</span> <span class="w"> </span><span class="ss">&quot;garry&quot;</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">14</span> <span class="w"> </span><span class="ss">&quot;ted&quot;</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">20</span> <span class="p">(</span><span class="mi">2</span><span class="w"> </span><span class="k">rows</span><span class="p">)</span> </pre></div> <p>Now kill <code>node1</code> in terminal 1. Then start it up again. <code>node2</code> will now be the leader. So exit <code>psql</code> in terminal 3 and enter it again pointed at <code>node2</code>, port <code>6001</code>. Add new data.</p> <div class="highlight"><pre><span></span><span class="err">$</span><span class="w"> </span><span class="n">psql</span><span class="w"> </span><span class="o">-</span><span class="n">h</span><span class="w"> </span><span class="mi">127</span><span class="p">.</span><span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">.</span><span class="mi">1</span><span class="w"> </span><span class="o">-</span><span class="n">p</span><span class="w"> </span><span class="mi">6001</span> <span class="n">psql</span><span class="w"> </span><span class="p">(</span><span class="mi">13</span><span class="p">.</span><span class="mi">4</span><span class="p">,</span><span class="w"> </span><span class="n">server</span><span class="w"> </span><span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">)</span> <span class="k">Type</span><span class="w"> </span><span class="ss">&quot;help&quot;</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">help</span><span class="p">.</span> <span class="n">phil</span><span class="o">=&gt;</span><span class="w"> </span><span class="k">insert</span><span class="w"> </span><span class="k">into</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="k">values</span><span class="p">(</span><span class="mi">19</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;ava&#39;</span><span class="p">),</span><span class="w"> </span><span class="p">(</span><span class="mi">18</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;ming&#39;</span><span class="p">);</span> <span class="n">could</span><span class="w"> </span><span class="k">not</span><span class="w"> </span><span class="n">interpret</span><span class="w"> </span><span class="k">result</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="n">server</span><span class="p">:</span><span class="w"> </span><span class="k">INSERT</span><span class="w"> </span><span class="n">ok</span> <span class="n">phil</span><span class="o">=&gt;</span><span class="w"> </span><span class="k">select</span><span class="w"> </span><span class="n">age</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="n">x</span><span class="p">;</span> <span class="w"> </span><span class="n">age</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">name</span> <span class="c1">-----+---------</span> <span class="w"> </span><span class="mi">20</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="ss">&quot;ted&quot;</span> <span class="w"> </span><span class="mi">14</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="ss">&quot;garry&quot;</span> <span class="w"> </span><span class="mi">18</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="ss">&quot;ming&quot;</span> <span class="w"> </span><span class="mi">19</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="ss">&quot;ava&quot;</span> </pre></div> <p>Exit <code>psql</code> in terminal 3 and start it up again against <code>node1</code> again, port <code>6000</code>.</p> <div class="highlight"><pre><span></span><span class="err">$</span><span class="w"> </span><span class="n">psql</span><span class="w"> </span><span class="o">-</span><span class="n">h</span><span class="w"> </span><span class="mi">127</span><span class="p">.</span><span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">.</span><span class="mi">1</span><span class="w"> </span><span class="o">-</span><span class="n">p</span><span class="w"> </span><span class="mi">6000</span> <span class="n">psql</span><span class="w"> </span><span class="p">(</span><span class="mi">13</span><span class="p">.</span><span class="mi">4</span><span class="p">,</span><span class="w"> </span><span class="n">server</span><span class="w"> </span><span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">)</span> <span class="k">Type</span><span class="w"> </span><span class="ss">&quot;help&quot;</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">help</span><span class="p">.</span> <span class="n">phil</span><span class="o">=&gt;</span><span class="w"> </span><span class="k">select</span><span class="w"> </span><span class="n">age</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="n">x</span><span class="p">;</span> <span class="w"> </span><span class="n">age</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">name</span> <span class="c1">-----+---------</span> <span class="w"> </span><span class="mi">20</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="ss">&quot;ted&quot;</span> <span class="w"> </span><span class="mi">14</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="ss">&quot;garry&quot;</span> <span class="w"> </span><span class="mi">18</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="ss">&quot;ming&quot;</span> <span class="w"> </span><span class="mi">19</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="ss">&quot;ava&quot;</span> <span class="p">(</span><span class="mi">2</span><span class="w"> </span><span class="k">rows</span><span class="p">)</span> </pre></div> <p>Nifty stuff.</p> <h3 id="summary">Summary</h3><p>So on the one hand this was a more complex post than my usual. Each process needed three servers running. Two of those servers we managed directly and the Raft server was managed by the Raft library.</p> <p>On the other hand, we did this all in a really small amount of code. Yes many edge cases were unhandled and massive amount of SQL was unhandled. And yes there are tons of inefficiencies like using JSON, an unstructured format when every table has fixed structure. But hopefully now you have an idea of how a project like this <em>could be structured</em>. And there's the beginnings of a framework for filling in syntax/edge cases over time.</p> <p>Additionally, the only problem we solved with consensus was replication, not sharding. This, and it's more complicated cousin (cross-shard transactions), is truly the special sauce Cockroach brings.</p> <p>Read more about building an intuition for sharding, replication, and distributed consensus [here](<a href="https://notes.eatonphil.com/2024-02-08-an-intuition-for-distributed-consensus-in-oltp-systems.html">https://notes.eatonphil.com/2024-02-08-an-intuition-for-distributed-consensus-in-oltp-systems.html</a>.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">New blog post is up :) Let&#39;s build a distributed postgres proof of concept.<a href="https://t.co/Z8BDzF1bUw">https://t.co/Z8BDzF1bUw</a> <a href="https://t.co/aSkOjr9Yrh">pic.twitter.com/aSkOjr9Yrh</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1526598365634605058?ref_src=twsrc%5Etfw">May 17, 2022</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/distributed-postgres.htmlTue, 17 May 2022 00:00:00 +0000SQLite in Go, with and without cgohttp://notes.eatonphil.com/sqlite-in-go-with-and-without-cgo.html<head> <meta http-equiv="refresh" content="4;URL='https://datastation.multiprocess.io/blog/2022-05-12-sqlite-in-go-with-and-without-cgo.html'" /> </head><p>This is an external post of mine. Click <a href="https://datastation.multiprocess.io/blog/2022-05-12-sqlite-in-go-with-and-without-cgo.html">here</a> if you are not redirected.</p> http://notes.eatonphil.com/sqlite-in-go-with-and-without-cgo.htmlThu, 12 May 2022 00:00:00 +0000HTML event handler attributes: down the rabbit holehttp://notes.eatonphil.com/event-handler-attributes.html<head> <meta http-equiv="refresh" content="4;URL='https://datastation.multiprocess.io/blog/2022-04-26-event-handler-attributes.html'" /> </head><p>This is an external post of mine. Click <a href="https://datastation.multiprocess.io/blog/2022-04-26-event-handler-attributes.html">here</a> if you are not redirected.</p> http://notes.eatonphil.com/event-handler-attributes.htmlTue, 26 Apr 2022 00:00:00 +0000Interview With Phil of DataStationhttp://notes.eatonphil.com/console-101.html<head> <meta http-equiv="refresh" content="4;URL='https://console.substack.com/p/console-101'" /> </head><p>This is an external interview. Click <a href="https://console.substack.com/p/console-101">here</a> if you are not redirected.</p> http://notes.eatonphil.com/console-101.htmlSun, 17 Apr 2022 00:00:00 +0000Surveying SQL parser libraries in a few high-level languageshttp://notes.eatonphil.com/sql-parsers.html<head> <meta http-equiv="refresh" content="4;URL='https://datastation.multiprocess.io/blog/2022-04-11-sql-parsers.html'" /> </head><p>This is an external post of mine. Click <a href="https://datastation.multiprocess.io/blog/2022-04-11-sql-parsers.html">here</a> if you are not redirected.</p> http://notes.eatonphil.com/sql-parsers.htmlMon, 11 Apr 2022 00:00:00 +0000Writing a document database from scratch in Go: Lucene-like filters and indexeshttp://notes.eatonphil.com/documentdb.html<p>In this post we'll write a rudimentary document database from scratch in Go. In less than 500 lines of code we'll be able to support the following interactions, inspired by Elasticsearch:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>curl<span class="w"> </span>-X<span class="w"> </span>POST<span class="w"> </span>-H<span class="w"> </span><span class="s1">&#39;Content-Type: application/json&#39;</span><span class="w"> </span>-d<span class="w"> </span><span class="s1">&#39;{&quot;name&quot;: &quot;Kevin&quot;, &quot;age&quot;: &quot;45&quot;}&#39;</span><span class="w"> </span>http://localhost:8080/docs <span class="o">{</span><span class="s2">&quot;body&quot;</span>:<span class="o">{</span><span class="s2">&quot;id&quot;</span>:<span class="s2">&quot;5ac64e74-58f9-4ba4-909e-1d5bf4ddcaa1&quot;</span><span class="o">}</span>,<span class="s2">&quot;status&quot;</span>:<span class="s2">&quot;ok&quot;</span><span class="o">}</span> $<span class="w"> </span>curl<span class="w"> </span>--get<span class="w"> </span>http://localhost:8080/docs<span class="w"> </span>--data-urlencode<span class="w"> </span><span class="s1">&#39;q=name:&quot;Kevin&quot;&#39;</span><span class="w"> </span><span class="p">|</span><span class="w"> </span>jq <span class="o">{</span> <span class="w"> </span><span class="s2">&quot;body&quot;</span>:<span class="w"> </span><span class="o">{</span> <span class="w"> </span><span class="s2">&quot;count&quot;</span>:<span class="w"> </span><span class="m">1</span>, <span class="w"> </span><span class="s2">&quot;documents&quot;</span>:<span class="w"> </span><span class="o">[</span> <span class="w"> </span><span class="o">{</span> <span class="w"> </span><span class="s2">&quot;body&quot;</span>:<span class="w"> </span><span class="o">{</span> <span class="w"> </span><span class="s2">&quot;age&quot;</span>:<span class="w"> </span><span class="s2">&quot;45&quot;</span>, <span class="w"> </span><span class="s2">&quot;name&quot;</span>:<span class="w"> </span><span class="s2">&quot;Kevin&quot;</span> <span class="w"> </span><span class="o">}</span>, <span class="w"> </span><span class="s2">&quot;id&quot;</span>:<span class="w"> </span><span class="s2">&quot;5ac64e74-58f9-4ba4-909e-1d5bf4ddcaa1&quot;</span> <span class="w"> </span><span class="o">}</span> <span class="w"> </span><span class="o">]</span> <span class="w"> </span><span class="o">}</span>, <span class="w"> </span><span class="s2">&quot;status&quot;</span>:<span class="w"> </span><span class="s2">&quot;ok&quot;</span> <span class="o">}</span> $<span class="w"> </span>curl<span class="w"> </span>--get<span class="w"> </span>http://localhost:8080/docs<span class="w"> </span>--data-urlencode<span class="w"> </span><span class="s1">&#39;q=age:&lt;50&#39;</span><span class="w"> </span><span class="p">|</span><span class="w"> </span>jq <span class="o">{</span> <span class="w"> </span><span class="s2">&quot;body&quot;</span>:<span class="w"> </span><span class="o">{</span> <span class="w"> </span><span class="s2">&quot;count&quot;</span>:<span class="w"> </span><span class="m">1</span>, <span class="w"> </span><span class="s2">&quot;documents&quot;</span>:<span class="w"> </span><span class="o">[</span> <span class="w"> </span><span class="o">{</span> <span class="w"> </span><span class="s2">&quot;body&quot;</span>:<span class="w"> </span><span class="o">{</span> <span class="w"> </span><span class="s2">&quot;age&quot;</span>:<span class="w"> </span><span class="s2">&quot;45&quot;</span>, <span class="w"> </span><span class="s2">&quot;name&quot;</span>:<span class="w"> </span><span class="s2">&quot;Kevin&quot;</span> <span class="w"> </span><span class="o">}</span>, <span class="w"> </span><span class="s2">&quot;id&quot;</span>:<span class="w"> </span><span class="s2">&quot;5ac64e74-58f9-4ba4-909e-1d5bf4ddcaa1&quot;</span> <span class="w"> </span><span class="o">}</span> <span class="w"> </span><span class="o">]</span> <span class="w"> </span><span class="o">}</span>, <span class="w"> </span><span class="s2">&quot;status&quot;</span>:<span class="w"> </span><span class="s2">&quot;ok&quot;</span> <span class="o">}</span> </pre></div> <p>The latter query, being a range query, will do a full table scan. But the first query, an exact match, will use an index and be much faster.</p> <p class="note"> Document databases in general may be able to support indexes on ranges but our rudimentary one won't. <br /> <br /> Furthermore, this post will not implement full text search. </p><p>All code for this project is <a href="https://github.com/eatonphil/docdb">available on Github</a>. Let's get started.</p> <h3 id="server-basics">Server basics</h3><p>Run <code>go mod init</code> and set up <code>main.go</code> with <a href="https://github.com/julienschmidt/httprouter">Julien Schmidt's httprouter</a>. We'll create three routes: one for inserting a document, one for retrieving a document by its id, and one for searching for documents.</p> <div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span> <span class="kn">import</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="s">&quot;encoding/json&quot;</span> <span class="w"> </span><span class="s">&quot;log&quot;</span> <span class="w"> </span><span class="s">&quot;net/http&quot;</span> <span class="w"> </span><span class="s">&quot;github.com/julienschmidt/httprouter&quot;</span> <span class="p">)</span> <span class="kd">type</span><span class="w"> </span><span class="nx">server</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">port</span><span class="w"> </span><span class="kt">string</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="nx">server</span><span class="p">{</span><span class="s">&quot;8080&quot;</span><span class="p">}</span> <span class="w"> </span><span class="nx">router</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">httprouter</span><span class="p">.</span><span class="nx">New</span><span class="p">()</span> <span class="w"> </span><span class="nx">router</span><span class="p">.</span><span class="nx">POST</span><span class="p">(</span><span class="s">&quot;/docs&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">addDocument</span><span class="p">)</span> <span class="w"> </span><span class="nx">router</span><span class="p">.</span><span class="nx">GET</span><span class="p">(</span><span class="s">&quot;/docs&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">searchDocuments</span><span class="p">)</span> <span class="w"> </span><span class="nx">router</span><span class="p">.</span><span class="nx">GET</span><span class="p">(</span><span class="s">&quot;/docs/:id&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">getDocument</span><span class="p">)</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="s">&quot;Listening on &quot;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">port</span><span class="p">)</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">ListenAndServe</span><span class="p">(</span><span class="s">&quot;:&quot;</span><span class="o">+</span><span class="nx">s</span><span class="p">.</span><span class="nx">port</span><span class="p">,</span><span class="w"> </span><span class="nx">router</span><span class="p">))</span> <span class="p">}</span> </pre></div> <p>Now add the routes:</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="nx">server</span><span class="p">)</span><span class="w"> </span><span class="nx">addDocument</span><span class="p">(</span><span class="nx">w</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">ResponseWriter</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Request</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="w"> </span><span class="nx">httprouter</span><span class="p">.</span><span class="nx">Params</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="s">&quot;Unimplemented&quot;</span><span class="p">)</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="nx">server</span><span class="p">)</span><span class="w"> </span><span class="nx">searchDocuments</span><span class="p">(</span><span class="nx">w</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">ResponseWriter</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Request</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="w"> </span><span class="nx">httprouter</span><span class="p">.</span><span class="nx">Params</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="s">&quot;Unimplemented&quot;</span><span class="p">)</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="nx">server</span><span class="p">)</span><span class="w"> </span><span class="nx">getDocument</span><span class="p">(</span><span class="nx">w</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">ResponseWriter</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Request</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="w"> </span><span class="nx">httprouter</span><span class="p">.</span><span class="nx">Params</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="s">&quot;Unimplemented&quot;</span><span class="p">)</span> <span class="p">}</span> </pre></div> <p>That's good enough for now! Let's think about storage.</p> <h3 id="storage">Storage</h3><p>If you wanted to do this project fully from scratch you could handle storage by just writing JSON blobs to disk. Nothing in this project will be much more complex than just writing JSON to disk and the equivalent of using <code>ls</code> on the filesystem. I mention this because I said this project is "from scratch" but I'm going to bring in a storage engine. My point is that you could easily follow this post and just read/write directly to disk if you felt strongly.</p> <p class="note"> Because there were so many folks misconstruing this paragraph, I've ported this blog post without Pebble as proof :D. You can <a href="https://github.com/eatonphil/docdb/pull/1">find the diff here</a>. Took me an hour for the +40/-40 diff that is still <500 lines of code. You may notice the code basically looks identical. That's because the storage engine isn't the interesting part. :) </p><p>Any storage engine would be fine: direct read/write, SQLite, PostgreSQL. But we're going to grab a key-value storage engine. I've used Badger before so I'm going to try out <a href="https://github.com/cockroachdb/pebble">Cockroach Lab's Pebble</a> this time instead.</p> <p>Add <code>"github.com/cockroachdb/pebble"</code> to the list of imports. Then upgrade the <code>server</code> struct to store an instance of a Pebble database.</p> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">server</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">db</span><span class="w"> </span><span class="o">*</span><span class="nx">pebble</span><span class="p">.</span><span class="nx">DB</span> <span class="w"> </span><span class="nx">port</span><span class="w"> </span><span class="kt">string</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">newServer</span><span class="p">(</span><span class="nx">database</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">port</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">server</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">server</span><span class="p">{</span><span class="nx">db</span><span class="p">:</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">port</span><span class="p">:</span><span class="w"> </span><span class="nx">port</span><span class="p">}</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="kt">error</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">db</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">pebble</span><span class="p">.</span><span class="nx">Open</span><span class="p">(</span><span class="nx">database</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">pebble</span><span class="p">.</span><span class="nx">Options</span><span class="p">{})</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="p">}</span> </pre></div> <p>And upgrade main:</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">newServer</span><span class="p">(</span><span class="s">&quot;docdb.data&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;8080&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">db</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span> <span class="w"> </span><span class="nx">router</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">httprouter</span><span class="p">.</span><span class="nx">New</span><span class="p">()</span> <span class="w"> </span><span class="nx">router</span><span class="p">.</span><span class="nx">POST</span><span class="p">(</span><span class="s">&quot;/docs&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">addDocument</span><span class="p">)</span> <span class="w"> </span><span class="nx">router</span><span class="p">.</span><span class="nx">GET</span><span class="p">(</span><span class="s">&quot;/docs&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">searchDocuments</span><span class="p">)</span> <span class="w"> </span><span class="nx">router</span><span class="p">.</span><span class="nx">GET</span><span class="p">(</span><span class="s">&quot;/docs/:id&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">getDocument</span><span class="p">)</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="s">&quot;Listening on &quot;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">port</span><span class="p">)</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">ListenAndServe</span><span class="p">(</span><span class="s">&quot;:&quot;</span><span class="o">+</span><span class="nx">s</span><span class="p">.</span><span class="nx">port</span><span class="p">,</span><span class="w"> </span><span class="nx">router</span><span class="p">))</span> <span class="p">}</span> </pre></div> <p>In the future these server settings could be user-configurable. For now they're hard-coded.</p> <h4 id="storing-data">Storing data</h4><p>When the user sends a JSON document we need to give it a unique ID and store the ID and document in the database. Since we're using a key-value storage engine we'll just use the ID as the key and the JSON document as the value.</p> <p>To generate the ID we'll use <a href="https://github.com/google/uuid">Google's UUID package</a>. So make sure to import <code>"github.com/google/uuid"</code>.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="nx">server</span><span class="p">)</span><span class="w"> </span><span class="nx">addDocument</span><span class="p">(</span><span class="nx">w</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">ResponseWriter</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Request</span><span class="p">,</span><span class="w"> </span><span class="nx">ps</span><span class="w"> </span><span class="nx">httprouter</span><span class="p">.</span><span class="nx">Params</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">dec</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">NewDecoder</span><span class="p">(</span><span class="nx">r</span><span class="p">.</span><span class="nx">Body</span><span class="p">)</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">document</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">dec</span><span class="p">.</span><span class="nx">Decode</span><span class="p">(</span><span class="o">&amp;</span><span class="nx">document</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">jsonResponse</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// New unique id for the document</span> <span class="w"> </span><span class="nx">id</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">uuid</span><span class="p">.</span><span class="nx">New</span><span class="p">().</span><span class="nx">String</span><span class="p">()</span> <span class="w"> </span><span class="nx">bs</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">Marshal</span><span class="p">(</span><span class="nx">document</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">jsonResponse</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">db</span><span class="p">.</span><span class="nx">Set</span><span class="p">([]</span><span class="nb">byte</span><span class="p">(</span><span class="nx">id</span><span class="p">),</span><span class="w"> </span><span class="nx">bs</span><span class="p">,</span><span class="w"> </span><span class="nx">pebble</span><span class="p">.</span><span class="nx">Sync</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">jsonResponse</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">jsonResponse</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span><span class="p">{</span> <span class="w"> </span><span class="s">&quot;id&quot;</span><span class="p">:</span><span class="w"> </span><span class="nx">id</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span> <span class="p">}</span> </pre></div> <p>Nothing special: just accept a JSON POST body and store it in the database, return the generated document id.</p> <p class="note"> I'm not sure that using UUIDs here is a good idea but it is easier than keeping track of the number of rows in the database. </p><p>The <code>jsonResponse</code> helper can be defined as:</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">jsonResponse</span><span class="p">(</span><span class="nx">w</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">ResponseWriter</span><span class="p">,</span><span class="w"> </span><span class="nx">body</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">data</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span><span class="p">{</span> <span class="w"> </span><span class="s">&quot;body&quot;</span><span class="p">:</span><span class="w"> </span><span class="nx">body</span><span class="p">,</span> <span class="w"> </span><span class="s">&quot;status&quot;</span><span class="p">:</span><span class="w"> </span><span class="s">&quot;ok&quot;</span><span class="p">,</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">w</span><span class="p">.</span><span class="nx">WriteHeader</span><span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusOK</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">data</span><span class="p">[</span><span class="s">&quot;status&quot;</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">&quot;error&quot;</span> <span class="w"> </span><span class="nx">data</span><span class="p">[</span><span class="s">&quot;error&quot;</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">err</span><span class="p">.</span><span class="nx">Error</span><span class="p">()</span> <span class="w"> </span><span class="nx">w</span><span class="p">.</span><span class="nx">WriteHeader</span><span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">StatusBadRequest</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">w</span><span class="p">.</span><span class="nx">Header</span><span class="p">().</span><span class="nx">Set</span><span class="p">(</span><span class="s">&quot;Content-Type&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;application/json&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">enc</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">NewEncoder</span><span class="p">(</span><span class="nx">w</span><span class="p">)</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">enc</span><span class="p">.</span><span class="nx">Encode</span><span class="p">(</span><span class="nx">data</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// TODO: set up panic handler?</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>It's a basic wrapper so that all responses are structured JSON.</p> <h4 id="retrieving-by-id">Retrieving by ID</h4><p>Before we try to test out inserts, let's get retrieval hooked up. Inserts return an ID in the HTTP reponse. GETs will grab a document by ID.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="nx">server</span><span class="p">)</span><span class="w"> </span><span class="nx">getDocumentById</span><span class="p">(</span><span class="nx">id</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">valBytes</span><span class="p">,</span><span class="w"> </span><span class="nx">closer</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">db</span><span class="p">.</span><span class="nx">Get</span><span class="p">(</span><span class="nx">id</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">closer</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">document</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">Unmarshal</span><span class="p">(</span><span class="nx">valBytes</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">document</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">document</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="nx">server</span><span class="p">)</span><span class="w"> </span><span class="nx">getDocument</span><span class="p">(</span><span class="nx">w</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">ResponseWriter</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Request</span><span class="p">,</span><span class="w"> </span><span class="nx">ps</span><span class="w"> </span><span class="nx">httprouter</span><span class="p">.</span><span class="nx">Params</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">id</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">ps</span><span class="p">.</span><span class="nx">ByName</span><span class="p">(</span><span class="s">&quot;id&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">document</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">getDocumentById</span><span class="p">([]</span><span class="nb">byte</span><span class="p">(</span><span class="nx">id</span><span class="p">))</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">jsonResponse</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">jsonResponse</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span><span class="p">{</span> <span class="w"> </span><span class="s">&quot;document&quot;</span><span class="p">:</span><span class="w"> </span><span class="nx">document</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span> <span class="p">}</span> </pre></div> <p>We've now got enough in place to test out these basics!</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>mod<span class="w"> </span>init<span class="w"> </span>docdb $<span class="w"> </span>go<span class="w"> </span>mod<span class="w"> </span>tidy $<span class="w"> </span>go<span class="w"> </span>build $<span class="w"> </span>./docdb <span class="m">2022</span>/03/28<span class="w"> </span><span class="m">19</span>:28:19<span class="w"> </span>Listening<span class="w"> </span>on<span class="w"> </span><span class="m">8080</span> </pre></div> <p>Now, in another terminal, insert a document:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>curl<span class="w"> </span>-X<span class="w"> </span>POST<span class="w"> </span>-H<span class="w"> </span><span class="s1">&#39;Content-Type: application/json&#39;</span><span class="w"> </span>-d<span class="w"> </span><span class="s1">&#39;{&quot;name&quot;: &quot;Kevin&quot;, &quot;age&quot;: &quot;45&quot;}&#39;</span><span class="w"> </span>http://localhost:8080/docs <span class="o">{</span><span class="s2">&quot;body&quot;</span>:<span class="o">{</span><span class="s2">&quot;id&quot;</span>:<span class="s2">&quot;c458a3ce-9faf-4431-a058-d9ae2a1651e1&quot;</span><span class="o">}</span>,<span class="s2">&quot;status&quot;</span>:<span class="s2">&quot;ok&quot;</span><span class="o">}</span> $<span class="w"> </span>curl<span class="w"> </span>http://localhost:8080/docs/c458a3ce-9faf-4431-a058-d9ae2a1651e1 <span class="o">{</span><span class="s2">&quot;body&quot;</span>:<span class="o">{</span><span class="s2">&quot;document&quot;</span>:<span class="o">{</span><span class="s2">&quot;age&quot;</span>:<span class="s2">&quot;45&quot;</span>,<span class="s2">&quot;name&quot;</span>:<span class="s2">&quot;Kevin&quot;</span><span class="o">}}</span>,<span class="s2">&quot;status&quot;</span>:<span class="s2">&quot;ok&quot;</span><span class="o">}</span> </pre></div> <p>Perfect! Now let's implement search.</p> <h3 id="a-filter-language">A filter language</h3><p>First off we need to pick a filter language. Using a JSON data structure would be fine. We could require the user POSTs against a search endpoint so that the POST body contains the JSON filter.</p> <p>But <a href="https://lucene.apache.org/core/2_9_4/queryparsersyntax.html">Lucene</a> is a pretty simple language and we can implement enough parts of it easily. The result is more fun.</p> <p>In our simplification of Lucene there will only be key-value matches. Field names and field values can be quoted. They must be quoted if they contain spaces or colons, among other things. Key-value matches are separated by whitespace. They can only be AND-ed together and that is done implicitly.</p> <p>The following are some valid filters in our implementation:</p> <ul> <li><code>a:1</code></li> <li><code>b:fifteen a:&lt;3</code></li> <li><code>a.b:12</code></li> <li><code>title:"Which way?"</code></li> <li><code>" a key 2":tenant</code></li> <li><code>" flubber ":"blubber "</code></li> </ul> <p>Nested paths are specified using JSON path syntax (i.e. <code>a.b</code> would retrieve <code>4</code> in <code>{"a": {"b": 4, "d": 100}, "c": 8}</code>).</p> <h3 id="lexing-strings">Lexing strings</h3><p>Both keys and values are lexed as strings. If they start with a quote, we keep on accumulating all characters until the ending quote. Otherwise we accumulate until we stop seeing a digit, letter, or period.</p> <div class="highlight"><pre><span></span><span class="c1">// Handles either quoted strings or unquoted strings of only contiguous digits and letters</span> <span class="kd">func</span><span class="w"> </span><span class="nx">lexString</span><span class="p">(</span><span class="nx">input</span><span class="w"> </span><span class="p">[]</span><span class="kt">rune</span><span class="p">,</span><span class="w"> </span><span class="nx">index</span><span class="w"> </span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">index</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">input</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">index</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">input</span><span class="p">[</span><span class="nx">index</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;&quot;&#39;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">index</span><span class="o">++</span> <span class="w"> </span><span class="nx">foundEnd</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="p">[]</span><span class="kt">rune</span> <span class="w"> </span><span class="c1">// TODO: handle nested quotes</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">index</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">input</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">input</span><span class="p">[</span><span class="nx">index</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;&quot;&#39;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">foundEnd</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">true</span> <span class="w"> </span><span class="k">break</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="nx">input</span><span class="p">[</span><span class="nx">index</span><span class="p">])</span> <span class="w"> </span><span class="nx">index</span><span class="o">++</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">foundEnd</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">index</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Expected end of quoted string&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">s</span><span class="p">),</span><span class="w"> </span><span class="nx">index</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// If unquoted, read as much contiguous digits/letters as there are</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="p">[]</span><span class="kt">rune</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="kt">rune</span> <span class="w"> </span><span class="c1">// TODO: someone needs to validate there&#39;s not ...</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">index</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">input</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">input</span><span class="p">[</span><span class="nx">index</span><span class="p">]</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!(</span><span class="nx">unicode</span><span class="p">.</span><span class="nx">IsLetter</span><span class="p">(</span><span class="nx">c</span><span class="p">)</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">unicode</span><span class="p">.</span><span class="nx">IsDigit</span><span class="p">(</span><span class="nx">c</span><span class="p">)</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;.&#39;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">break</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="p">)</span> <span class="w"> </span><span class="nx">index</span><span class="o">++</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">s</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">index</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;No string found&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">s</span><span class="p">),</span><span class="w"> </span><span class="nx">index</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <p class="note"> This is not something you get right without unit tests. I wrote unit tests for it while building this project. Always unit test tricky code where you're likely to have off-by-one errors! I had a bunch. </p><h3 id="query-parser">Query parser</h3><p>Now we can write the query parser. It first lexes a string for the key. Then it looks for the operator which can be one of <code>:</code> (meaning equality), <code>:&gt;</code> (meaning greater than), or <code>:&lt;</code> (meaning less than). It accumulates each key-value pair into an overall list of AND-ed arguments that make up the query.</p> <div class="highlight"><pre><span></span><span class="n">type</span><span class="w"> </span><span class="n">queryComparison</span><span class="w"> </span><span class="n">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">key</span><span class="w"> </span><span class="p">[]</span><span class="n">string</span> <span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="n">string</span> <span class="w"> </span><span class="n">op</span><span class="w"> </span><span class="n">string</span> <span class="p">}</span> <span class="n">type</span><span class="w"> </span><span class="n">query</span><span class="w"> </span><span class="n">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">ands</span><span class="w"> </span><span class="p">[]</span><span class="n">queryComparison</span> <span class="p">}</span> <span class="o">//</span><span class="w"> </span><span class="n">E</span><span class="o">.</span><span class="n">g</span><span class="o">.</span><span class="w"> </span><span class="n">q</span><span class="o">=</span><span class="n">a</span><span class="o">.</span><span class="n">b</span><span class="p">:</span><span class="mi">12</span> <span class="k">func</span><span class="w"> </span><span class="n">parseQuery</span><span class="p">(</span><span class="n">q</span><span class="w"> </span><span class="n">string</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="n">query</span><span class="p">,</span><span class="w"> </span><span class="n">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">q</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s2">&quot;&quot;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&amp;</span><span class="n">query</span><span class="p">{},</span><span class="w"> </span><span class="n">nil</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="mi">0</span> <span class="w"> </span><span class="k">var</span><span class="w"> </span><span class="n">parsed</span><span class="w"> </span><span class="n">query</span> <span class="w"> </span><span class="k">var</span><span class="w"> </span><span class="n">qRune</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[]</span><span class="n">rune</span><span class="p">(</span><span class="n">q</span><span class="p">)</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">len</span><span class="p">(</span><span class="n">qRune</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Eat</span><span class="w"> </span><span class="n">whitespace</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">unicode</span><span class="o">.</span><span class="n">IsSpace</span><span class="p">(</span><span class="n">qRune</span><span class="p">[</span><span class="n">i</span><span class="p">])</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">i</span><span class="o">++</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">key</span><span class="p">,</span><span class="w"> </span><span class="n">nextIndex</span><span class="p">,</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">lexString</span><span class="p">(</span><span class="n">qRune</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">nil</span><span class="p">,</span><span class="w"> </span><span class="n">fmt</span><span class="o">.</span><span class="n">Errorf</span><span class="p">(</span><span class="s2">&quot;Expected valid key, got [</span><span class="si">%s</span><span class="s2">]: `</span><span class="si">%s</span><span class="s2">`&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">err</span><span class="p">,</span><span class="w"> </span><span class="n">q</span><span class="p">[</span><span class="n">nextIndex</span><span class="p">:])</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Expect</span><span class="w"> </span><span class="n">some</span><span class="w"> </span><span class="n">operator</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">q</span><span class="p">[</span><span class="n">nextIndex</span><span class="p">]</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="s1">&#39;:&#39;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">nil</span><span class="p">,</span><span class="w"> </span><span class="n">fmt</span><span class="o">.</span><span class="n">Errorf</span><span class="p">(</span><span class="s2">&quot;Expected colon at </span><span class="si">%d</span><span class="s2">, got: `</span><span class="si">%s</span><span class="s2">`&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">nextIndex</span><span class="p">,</span><span class="w"> </span><span class="n">q</span><span class="p">[</span><span class="n">nextIndex</span><span class="p">:])</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">nextIndex</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span> <span class="w"> </span><span class="n">op</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="s2">&quot;=&quot;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">q</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s1">&#39;&gt;&#39;</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">q</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s1">&#39;&lt;&#39;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">op</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">string</span><span class="p">(</span><span class="n">q</span><span class="p">[</span><span class="n">i</span><span class="p">])</span> <span class="w"> </span><span class="n">i</span><span class="o">++</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">value</span><span class="p">,</span><span class="w"> </span><span class="n">nextIndex</span><span class="p">,</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">lexString</span><span class="p">(</span><span class="n">qRune</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">nil</span><span class="p">,</span><span class="w"> </span><span class="n">fmt</span><span class="o">.</span><span class="n">Errorf</span><span class="p">(</span><span class="s2">&quot;Expected valid value, got [</span><span class="si">%s</span><span class="s2">]: `</span><span class="si">%s</span><span class="s2">`&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">err</span><span class="p">,</span><span class="w"> </span><span class="n">q</span><span class="p">[</span><span class="n">nextIndex</span><span class="p">:])</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">nextIndex</span> <span class="w"> </span><span class="n">argument</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">queryComparison</span><span class="p">{</span><span class="n">key</span><span class="p">:</span><span class="w"> </span><span class="n">strings</span><span class="o">.</span><span class="n">Split</span><span class="p">(</span><span class="n">key</span><span class="p">,</span><span class="w"> </span><span class="s2">&quot;.&quot;</span><span class="p">),</span><span class="w"> </span><span class="n">value</span><span class="p">:</span><span class="w"> </span><span class="n">value</span><span class="p">,</span><span class="w"> </span><span class="n">op</span><span class="p">:</span><span class="w"> </span><span class="n">op</span><span class="p">}</span> <span class="w"> </span><span class="n">parsed</span><span class="o">.</span><span class="n">ands</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">append</span><span class="p">(</span><span class="n">parsed</span><span class="o">.</span><span class="n">ands</span><span class="p">,</span><span class="w"> </span><span class="n">argument</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&amp;</span><span class="n">parsed</span><span class="p">,</span><span class="w"> </span><span class="n">nil</span> <span class="p">}</span> </pre></div> <p>Since we're already writing a real lexer we could do better than <code>strings.Split(key, ".")</code> when it comes to find key path parts. But it isn't a huge deal at this stage. So we keep it simple.</p> <h3 id="query-matching">Query matching</h3><p>Now that we've got the query parser we need to implement an evaluator for the search endpoint. We need to be able to check that given a document, it meets the filter or not.</p> <p>So we iterate over each argument and do the indicated comparison: equality, greater than or less than. If at any point the comparison fails, return false immediately. Otherwise if we got through all arguments and didn't return, there was a match!</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">q</span><span class="w"> </span><span class="nx">query</span><span class="p">)</span><span class="w"> </span><span class="nx">match</span><span class="p">(</span><span class="nx">doc</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span><span class="p">)</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">argument</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">q</span><span class="p">.</span><span class="nx">ands</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">getPath</span><span class="p">(</span><span class="nx">doc</span><span class="p">,</span><span class="w"> </span><span class="nx">argument</span><span class="p">.</span><span class="nx">key</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Handle equality</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">argument</span><span class="p">.</span><span class="nx">op</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;=&quot;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">match</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">&quot;%v&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">argument</span><span class="p">.</span><span class="nx">value</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">match</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">continue</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Handle &lt;, &gt;</span> <span class="w"> </span><span class="nx">right</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">strconv</span><span class="p">.</span><span class="nx">ParseFloat</span><span class="p">(</span><span class="nx">argument</span><span class="p">.</span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="mi">64</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">left</span><span class="w"> </span><span class="kt">float64</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">t</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">value</span><span class="p">.(</span><span class="kd">type</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="kt">float64</span><span class="p">:</span> <span class="w"> </span><span class="nx">left</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">t</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="kt">float32</span><span class="p">:</span> <span class="w"> </span><span class="nx">left</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">float64</span><span class="p">(</span><span class="nx">t</span><span class="p">)</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="kt">uint</span><span class="p">:</span> <span class="w"> </span><span class="nx">left</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">float64</span><span class="p">(</span><span class="nx">t</span><span class="p">)</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="kt">uint8</span><span class="p">:</span> <span class="w"> </span><span class="nx">left</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">float64</span><span class="p">(</span><span class="nx">t</span><span class="p">)</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="kt">uint16</span><span class="p">:</span> <span class="w"> </span><span class="nx">left</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">float64</span><span class="p">(</span><span class="nx">t</span><span class="p">)</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="kt">uint32</span><span class="p">:</span> <span class="w"> </span><span class="nx">left</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">float64</span><span class="p">(</span><span class="nx">t</span><span class="p">)</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="kt">uint64</span><span class="p">:</span> <span class="w"> </span><span class="nx">left</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">float64</span><span class="p">(</span><span class="nx">t</span><span class="p">)</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="kt">int</span><span class="p">:</span> <span class="w"> </span><span class="nx">left</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">float64</span><span class="p">(</span><span class="nx">t</span><span class="p">)</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="kt">int8</span><span class="p">:</span> <span class="w"> </span><span class="nx">left</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">float64</span><span class="p">(</span><span class="nx">t</span><span class="p">)</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="kt">int16</span><span class="p">:</span> <span class="w"> </span><span class="nx">left</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">float64</span><span class="p">(</span><span class="nx">t</span><span class="p">)</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="kt">int32</span><span class="p">:</span> <span class="w"> </span><span class="nx">left</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">float64</span><span class="p">(</span><span class="nx">t</span><span class="p">)</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="kt">int64</span><span class="p">:</span> <span class="w"> </span><span class="nx">left</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">float64</span><span class="p">(</span><span class="nx">t</span><span class="p">)</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="kt">string</span><span class="p">:</span> <span class="w"> </span><span class="nx">left</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">strconv</span><span class="p">.</span><span class="nx">ParseFloat</span><span class="p">(</span><span class="nx">t</span><span class="p">,</span><span class="w"> </span><span class="mi">64</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">default</span><span class="p">:</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">argument</span><span class="p">.</span><span class="nx">op</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;&gt;&quot;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">left</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="nx">right</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">continue</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">left</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="nx">right</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">true</span> <span class="p">}</span> </pre></div> <p class="note"> This bit of Go that requires separate case statements for every possible numeric so I can convert it to float is really annoying. </p><p>The only additional part to call out in there is <code>getPath</code>. We need to be able to grab any path within an object since the user could have made a filter like <code>a.b:12</code>. So let's keep things simple (but less safe) and implement <code>getPath</code> recursively.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">getPath</span><span class="p">(</span><span class="nx">doc</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span><span class="p">,</span><span class="w"> </span><span class="nx">parts</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="kt">any</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">docSegment</span><span class="w"> </span><span class="kt">any</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">doc</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">part</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">parts</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">m</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">docSegment</span><span class="p">.(</span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">docSegment</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">m</span><span class="p">[</span><span class="nx">part</span><span class="p">];</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">docSegment</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span> <span class="p">}</span> </pre></div> <p>A critical thing to point out is that filtering on arrays is not supported. Any filter that tries to enter an array will fail or return no results.</p> <h3 id="search">Search</h3><p>Now that we've got all the tools in place we can implement the search endpoint. We'll just iterate over all documents in the database and return all documents that match the filter.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="nx">server</span><span class="p">)</span><span class="w"> </span><span class="nx">searchDocuments</span><span class="p">(</span><span class="nx">w</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">ResponseWriter</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Request</span><span class="p">,</span><span class="w"> </span><span class="nx">ps</span><span class="w"> </span><span class="nx">httprouter</span><span class="p">.</span><span class="nx">Params</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">q</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parseQuery</span><span class="p">(</span><span class="nx">r</span><span class="p">.</span><span class="nx">URL</span><span class="p">.</span><span class="nx">Query</span><span class="p">().</span><span class="nx">Get</span><span class="p">(</span><span class="s">&quot;q&quot;</span><span class="p">))</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">jsonResponse</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">documents</span><span class="w"> </span><span class="p">[]</span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span> <span class="w"> </span><span class="nx">iter</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">db</span><span class="p">.</span><span class="nx">NewIter</span><span class="p">(</span><span class="kc">nil</span><span class="p">)</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">iter</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">iter</span><span class="p">.</span><span class="nx">First</span><span class="p">();</span><span class="w"> </span><span class="nx">iter</span><span class="p">.</span><span class="nx">Valid</span><span class="p">();</span><span class="w"> </span><span class="nx">iter</span><span class="p">.</span><span class="nx">Next</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">document</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">Unmarshal</span><span class="p">(</span><span class="nx">iter</span><span class="p">.</span><span class="nx">Value</span><span class="p">(),</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">document</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">jsonResponse</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">q</span><span class="p">.</span><span class="nx">match</span><span class="p">(</span><span class="nx">document</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">documents</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">documents</span><span class="p">,</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span><span class="p">{</span> <span class="w"> </span><span class="s">&quot;id&quot;</span><span class="p">:</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">iter</span><span class="p">.</span><span class="nx">Key</span><span class="p">()),</span> <span class="w"> </span><span class="s">&quot;body&quot;</span><span class="p">:</span><span class="w"> </span><span class="nx">document</span><span class="p">,</span> <span class="w"> </span><span class="p">})</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">jsonResponse</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span><span class="p">{</span><span class="s">&quot;documents&quot;</span><span class="p">:</span><span class="w"> </span><span class="nx">documents</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;count&quot;</span><span class="p">:</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">documents</span><span class="p">)},</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span> <span class="p">}</span> </pre></div> <p>Not bad! Let's try it out:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>build $<span class="w"> </span>./docdb </pre></div> <p>And in another terminal, try out the search endpoint with no filter:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>curl<span class="w"> </span>http://localhost:8080/docs<span class="w"> </span><span class="p">|</span><span class="w"> </span>jq <span class="o">{</span> <span class="w"> </span><span class="s2">&quot;body&quot;</span>:<span class="w"> </span><span class="o">{</span> <span class="w"> </span><span class="s2">&quot;count&quot;</span>:<span class="w"> </span><span class="m">1</span>, <span class="w"> </span><span class="s2">&quot;documents&quot;</span>:<span class="w"> </span><span class="o">[</span> <span class="w"> </span><span class="o">{</span> <span class="w"> </span><span class="s2">&quot;body&quot;</span>:<span class="w"> </span><span class="o">{</span> <span class="w"> </span><span class="s2">&quot;age&quot;</span>:<span class="w"> </span><span class="s2">&quot;45&quot;</span>, <span class="w"> </span><span class="s2">&quot;name&quot;</span>:<span class="w"> </span><span class="s2">&quot;Kevin&quot;</span> <span class="w"> </span><span class="o">}</span>, <span class="w"> </span><span class="s2">&quot;id&quot;</span>:<span class="w"> </span><span class="s2">&quot;c458a3ce-9faf-4431-a058-d9ae2a1651e1&quot;</span> <span class="w"> </span><span class="o">}</span> <span class="w"> </span><span class="o">]</span> <span class="w"> </span><span class="o">}</span>, <span class="w"> </span><span class="s2">&quot;status&quot;</span>:<span class="w"> </span><span class="s2">&quot;ok&quot;</span> <span class="o">}</span> </pre></div> <p>With an equality filter:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>curl<span class="w"> </span>--get<span class="w"> </span>http://localhost:8080/docs<span class="w"> </span>--data-urlencode<span class="w"> </span><span class="s1">&#39;q=name:Mel&#39;</span><span class="w"> </span><span class="p">|</span><span class="w"> </span>jq <span class="o">{</span> <span class="w"> </span><span class="s2">&quot;body&quot;</span>:<span class="w"> </span><span class="o">{</span> <span class="w"> </span><span class="s2">&quot;count&quot;</span>:<span class="w"> </span><span class="m">0</span>, <span class="w"> </span><span class="s2">&quot;documents&quot;</span>:<span class="w"> </span>null <span class="w"> </span><span class="o">}</span>, <span class="w"> </span><span class="s2">&quot;status&quot;</span>:<span class="w"> </span><span class="s2">&quot;ok&quot;</span> <span class="o">}</span> $<span class="w"> </span>curl<span class="w"> </span>--get<span class="w"> </span>http://localhost:8080/docs<span class="w"> </span>--data-urlencode<span class="w"> </span><span class="s1">&#39;q=name:Kevin&#39;</span><span class="w"> </span><span class="p">|</span><span class="w"> </span>jq <span class="o">{</span> <span class="w"> </span><span class="s2">&quot;body&quot;</span>:<span class="w"> </span><span class="o">{</span> <span class="w"> </span><span class="s2">&quot;count&quot;</span>:<span class="w"> </span><span class="m">1</span>, <span class="w"> </span><span class="s2">&quot;documents&quot;</span>:<span class="w"> </span><span class="o">[</span> <span class="w"> </span><span class="o">{</span> <span class="w"> </span><span class="s2">&quot;body&quot;</span>:<span class="w"> </span><span class="o">{</span> <span class="w"> </span><span class="s2">&quot;age&quot;</span>:<span class="w"> </span><span class="s2">&quot;45&quot;</span>, <span class="w"> </span><span class="s2">&quot;name&quot;</span>:<span class="w"> </span><span class="s2">&quot;Kevin&quot;</span> <span class="w"> </span><span class="o">}</span>, <span class="w"> </span><span class="s2">&quot;id&quot;</span>:<span class="w"> </span><span class="s2">&quot;c458a3ce-9faf-4431-a058-d9ae2a1651e1&quot;</span> <span class="w"> </span><span class="o">}</span> <span class="w"> </span><span class="o">]</span> <span class="w"> </span><span class="o">}</span>, <span class="w"> </span><span class="s2">&quot;status&quot;</span>:<span class="w"> </span><span class="s2">&quot;ok&quot;</span> <span class="o">}</span> </pre></div> <p>And with greater than/less than filters:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>curl<span class="w"> </span>--get<span class="w"> </span>http://localhost:8080/docs<span class="w"> </span>--data-urlencode<span class="w"> </span><span class="s1">&#39;q=age:&lt;12&#39;</span><span class="w"> </span><span class="p">|</span><span class="w"> </span>jq <span class="o">{</span> <span class="w"> </span><span class="s2">&quot;body&quot;</span>:<span class="w"> </span><span class="o">{</span> <span class="w"> </span><span class="s2">&quot;count&quot;</span>:<span class="w"> </span><span class="m">0</span>, <span class="w"> </span><span class="s2">&quot;documents&quot;</span>:<span class="w"> </span>null <span class="w"> </span><span class="o">}</span>, <span class="w"> </span><span class="s2">&quot;status&quot;</span>:<span class="w"> </span><span class="s2">&quot;ok&quot;</span> <span class="o">}</span> $<span class="w"> </span>curl<span class="w"> </span>--get<span class="w"> </span>http://localhost:8080/docs<span class="w"> </span>--data-urlencode<span class="w"> </span><span class="s1">&#39;q=age:&lt;200&#39;</span><span class="w"> </span><span class="p">|</span><span class="w"> </span>jq <span class="o">{</span> <span class="w"> </span><span class="s2">&quot;body&quot;</span>:<span class="w"> </span><span class="o">{</span> <span class="w"> </span><span class="s2">&quot;count&quot;</span>:<span class="w"> </span><span class="m">1</span>, <span class="w"> </span><span class="s2">&quot;documents&quot;</span>:<span class="w"> </span><span class="o">[</span> <span class="w"> </span><span class="o">{</span> <span class="w"> </span><span class="s2">&quot;body&quot;</span>:<span class="w"> </span><span class="o">{</span> <span class="w"> </span><span class="s2">&quot;age&quot;</span>:<span class="w"> </span><span class="s2">&quot;45&quot;</span>, <span class="w"> </span><span class="s2">&quot;name&quot;</span>:<span class="w"> </span><span class="s2">&quot;Kevin&quot;</span> <span class="w"> </span><span class="o">}</span>, <span class="w"> </span><span class="s2">&quot;id&quot;</span>:<span class="w"> </span><span class="s2">&quot;c458a3ce-9faf-4431-a058-d9ae2a1651e1&quot;</span> <span class="w"> </span><span class="o">}</span> <span class="w"> </span><span class="o">]</span> <span class="w"> </span><span class="o">}</span>, <span class="w"> </span><span class="s2">&quot;status&quot;</span>:<span class="w"> </span><span class="s2">&quot;ok&quot;</span> <span class="o">}</span> </pre></div> <p>Sweet.</p> <h3 id="benchmarking">Benchmarking</h3><p>Now let's try inserting a few hundred thousand rows of real-world data. Grab <code>movies.json</code> from the <a href="https://github.com/prust/wikipedia-movie-data">Wikipedia Movie Data repo</a>. This dataset only has 28,000 rows. But we can insert it multiple times. If we filter by movie name and movie year we'll be looking at only a small subset of the data but enough that we can get a sense about performance.</p> <p>Here's a basic script to ingest that data a bunch of times once you've downloaded the file.</p> <div class="highlight"><pre><span></span><span class="ch">#!/usr/bin/env bash</span> <span class="nb">set</span><span class="w"> </span>-e <span class="nv">count</span><span class="o">=</span><span class="m">50</span> <span class="k">for</span><span class="w"> </span>run<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="o">{</span><span class="m">1</span>..50<span class="o">}</span><span class="p">;</span><span class="w"> </span><span class="k">do</span> <span class="w"> </span>jq<span class="w"> </span>-c<span class="w"> </span><span class="s1">&#39;.[]&#39;</span><span class="w"> </span><span class="s2">&quot;</span><span class="nv">$1</span><span class="s2">&quot;</span><span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="nb">read</span><span class="w"> </span>data<span class="p">;</span><span class="w"> </span><span class="k">do</span> <span class="w"> </span>curl<span class="w"> </span>-X<span class="w"> </span>POST<span class="w"> </span>-H<span class="w"> </span><span class="s1">&#39;Content-Type: application/json&#39;</span><span class="w"> </span>-d<span class="w"> </span><span class="s2">&quot;</span><span class="nv">$data</span><span class="s2">&quot;</span><span class="w"> </span>http://localhost:8080/docs <span class="w"> </span><span class="k">done</span> <span class="k">done</span> </pre></div> <p>Start it up and wait as long as you can. :)</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>chmod<span class="w"> </span>+x<span class="w"> </span>scripts/load_array.sh $<span class="w"> </span>./scripts/load_array.sh<span class="w"> </span>movies.json </pre></div> <p>You can check how many items are in the database like so:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>curl<span class="w"> </span>http://localhost:8080/docs<span class="w"> </span><span class="p">|</span><span class="w"> </span>jq<span class="w"> </span><span class="s1">&#39;.body.count&#39;</span> <span class="m">12649</span> </pre></div> <p>Once you have a few hundred thousand documents you'll start to notice exact equality queries start to take longer:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">time</span><span class="w"> </span>curl<span class="w"> </span>-s<span class="w"> </span>--get<span class="w"> </span>http://localhost:8080/docs<span class="w"> </span>--data-urlencode<span class="w"> </span><span class="s1">&#39;q=&quot;year&quot;:1918&#39;</span><span class="w"> </span><span class="p">|</span><span class="w"> </span>jq<span class="w"> </span><span class="s1">&#39;.body.count&#39;</span> <span class="m">1152</span> curl<span class="w"> </span>-s<span class="w"> </span>--get<span class="w"> </span>http://localhost:8080/docs<span class="w"> </span>--data-urlencode<span class="w"> </span><span class="s1">&#39;q=&quot;year&quot;:1918&#39;</span><span class="w"> </span><span class="m">0</span>.00s<span class="w"> </span>user<span class="w"> </span><span class="m">0</span>.00s<span class="w"> </span>system<span class="w"> </span><span class="m">0</span>%<span class="w"> </span>cpu<span class="w"> </span><span class="m">0</span>.992<span class="w"> </span>total </pre></div> <p>And you think: although there are hundreds of thousands of documents, if I'm just asking for documents with a certain value such that there are only 1000 documents that match that value, shouldn't it be possible to grab them more quickly than in one whole second? Or, better than a time that grows with the number of documents in the database?</p> <p>Yes. Yes it is possible.</p> <h3 id="indexes">Indexes</h3><p>Document databases often index everything. We're going to do that. For every path in a document (that isn't a path within an array) we're going to store the path and the value of the document at that path.</p> <p>First we'll open a second database that we'll use to store all of these path-value pairs.</p> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">server</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">db</span><span class="w"> </span><span class="o">*</span><span class="nx">pebble</span><span class="p">.</span><span class="nx">DB</span><span class="w"> </span><span class="c1">// Primary data</span> <span class="w"> </span><span class="nx">indexDb</span><span class="w"> </span><span class="o">*</span><span class="nx">pebble</span><span class="p">.</span><span class="nx">DB</span><span class="w"> </span><span class="c1">// Index data</span> <span class="w"> </span><span class="nx">port</span><span class="w"> </span><span class="kt">string</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">newServer</span><span class="p">(</span><span class="nx">database</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">port</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">server</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">server</span><span class="p">{</span><span class="nx">db</span><span class="p">:</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">port</span><span class="p">:</span><span class="w"> </span><span class="nx">port</span><span class="p">}</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="kt">error</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">db</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">pebble</span><span class="p">.</span><span class="nx">Open</span><span class="p">(</span><span class="nx">database</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">pebble</span><span class="p">.</span><span class="nx">Options</span><span class="p">{})</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">indexDb</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">pebble</span><span class="p">.</span><span class="nx">Open</span><span class="p">(</span><span class="nx">database</span><span class="o">+</span><span class="s">&quot;.index&quot;</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">pebble</span><span class="p">.</span><span class="nx">Options</span><span class="p">{})</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="p">}</span> </pre></div> <p>Then when we insert, we'll call an <code>index</code> function to generate all path-value pairs and store them in this second database.</p> <p>The index database will store the path-value pair as keys. And values will be the comma separated list of document IDs that have that path-value pair.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="nx">server</span><span class="p">)</span><span class="w"> </span><span class="nx">index</span><span class="p">(</span><span class="nx">id</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">document</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">pv</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">getPathValues</span><span class="p">(</span><span class="nx">document</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">pathValue</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">pv</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">idsString</span><span class="p">,</span><span class="w"> </span><span class="nx">closer</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">indexDb</span><span class="p">.</span><span class="nx">Get</span><span class="p">([]</span><span class="nb">byte</span><span class="p">(</span><span class="nx">pathValue</span><span class="p">))</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">pebble</span><span class="p">.</span><span class="nx">ErrNotFound</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">&quot;Could not look up pathvalue [%#v]: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">document</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">idsString</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">idsString</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="p">[]</span><span class="nb">byte</span><span class="p">(</span><span class="nx">id</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">ids</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">Split</span><span class="p">(</span><span class="nb">string</span><span class="p">(</span><span class="nx">idsString</span><span class="p">),</span><span class="w"> </span><span class="s">&quot;,&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">found</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">existingId</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">ids</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">id</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">existingId</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">found</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">true</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">found</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">idsString</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">idsString</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="nb">byte</span><span class="p">(</span><span class="s">&quot;,&quot;</span><span class="o">+</span><span class="nx">id</span><span class="p">)</span><span class="o">...</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">closer</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">closer</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">&quot;Could not close: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">indexDb</span><span class="p">.</span><span class="nx">Set</span><span class="p">([]</span><span class="nb">byte</span><span class="p">(</span><span class="nx">pathValue</span><span class="p">),</span><span class="w"> </span><span class="nx">idsString</span><span class="p">,</span><span class="w"> </span><span class="nx">pebble</span><span class="p">.</span><span class="nx">Sync</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">&quot;Could not update index: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>Keeping things simple we'll also implement this <code>getPathValues</code> helper recursively:</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">getPathValues</span><span class="p">(</span><span class="nx">obj</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span><span class="p">,</span><span class="w"> </span><span class="nx">prefix</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">pvs</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">key</span><span class="p">,</span><span class="w"> </span><span class="nx">val</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">obj</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">t</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">val</span><span class="p">.(</span><span class="kd">type</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span><span class="p">:</span> <span class="w"> </span><span class="nx">pvs</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">pvs</span><span class="p">,</span><span class="w"> </span><span class="nx">getPathValues</span><span class="p">(</span><span class="nx">t</span><span class="p">,</span><span class="w"> </span><span class="nx">key</span><span class="p">)</span><span class="o">...</span><span class="p">)</span> <span class="w"> </span><span class="k">continue</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="p">[]</span><span class="kd">interface</span><span class="p">{}:</span> <span class="w"> </span><span class="c1">// Can&#39;t handle arrays</span> <span class="w"> </span><span class="k">continue</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">prefix</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">key</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">prefix</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s">&quot;.&quot;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">key</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">pvs</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">pvs</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">&quot;%s=%v&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">key</span><span class="p">,</span><span class="w"> </span><span class="nx">val</span><span class="p">))</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">pvs</span> <span class="p">}</span> </pre></div> <p>We'll update one line in <code>s.addDocument</code> to call this <code>index</code> function.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="nx">server</span><span class="p">)</span><span class="w"> </span><span class="nx">addDocument</span><span class="p">(</span><span class="nx">w</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">ResponseWriter</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Request</span><span class="p">,</span><span class="w"> </span><span class="nx">ps</span><span class="w"> </span><span class="nx">httprouter</span><span class="p">.</span><span class="nx">Params</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">dec</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">NewDecoder</span><span class="p">(</span><span class="nx">r</span><span class="p">.</span><span class="nx">Body</span><span class="p">)</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">document</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">dec</span><span class="p">.</span><span class="nx">Decode</span><span class="p">(</span><span class="o">&amp;</span><span class="nx">document</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">jsonResponse</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// New unique id for the document</span> <span class="w"> </span><span class="nx">id</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">uuid</span><span class="p">.</span><span class="nx">New</span><span class="p">().</span><span class="nx">String</span><span class="p">()</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">index</span><span class="p">(</span><span class="nx">id</span><span class="p">,</span><span class="w"> </span><span class="nx">document</span><span class="p">)</span> <span class="w"> </span><span class="nx">bs</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">Marshal</span><span class="p">(</span><span class="nx">document</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">jsonResponse</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">db</span><span class="p">.</span><span class="nx">Set</span><span class="p">([]</span><span class="nb">byte</span><span class="p">(</span><span class="nx">id</span><span class="p">),</span><span class="w"> </span><span class="nx">bs</span><span class="p">,</span><span class="w"> </span><span class="nx">pebble</span><span class="p">.</span><span class="nx">Sync</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">jsonResponse</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">jsonResponse</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span><span class="p">{</span> <span class="w"> </span><span class="s">&quot;id&quot;</span><span class="p">:</span><span class="w"> </span><span class="nx">id</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span> <span class="p">}</span> </pre></div> <p>And we'll add a <code>reindex</code> function to be called in <code>main</code> to handle any documents that were ingested and not indexed (i.e. all the ones we already inserted).</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="nx">server</span><span class="p">)</span><span class="w"> </span><span class="nx">reindex</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">iter</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">db</span><span class="p">.</span><span class="nx">NewIter</span><span class="p">(</span><span class="kc">nil</span><span class="p">)</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">iter</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">iter</span><span class="p">.</span><span class="nx">First</span><span class="p">();</span><span class="w"> </span><span class="nx">iter</span><span class="p">.</span><span class="nx">Valid</span><span class="p">();</span><span class="w"> </span><span class="nx">iter</span><span class="p">.</span><span class="nx">Next</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">document</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">Unmarshal</span><span class="p">(</span><span class="nx">iter</span><span class="p">.</span><span class="nx">Value</span><span class="p">(),</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">document</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">&quot;Unable to parse bad document, %s: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">iter</span><span class="p">.</span><span class="nx">Key</span><span class="p">()),</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">index</span><span class="p">(</span><span class="nb">string</span><span class="p">(</span><span class="nx">iter</span><span class="p">.</span><span class="nx">Key</span><span class="p">()),</span><span class="w"> </span><span class="nx">document</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">newServer</span><span class="p">(</span><span class="s">&quot;docdb.data&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;8080&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">db</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">reindex</span><span class="p">()</span> <span class="w"> </span><span class="nx">router</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">httprouter</span><span class="p">.</span><span class="nx">New</span><span class="p">()</span> <span class="w"> </span><span class="nx">router</span><span class="p">.</span><span class="nx">POST</span><span class="p">(</span><span class="s">&quot;/docs&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">addDocument</span><span class="p">)</span> <span class="w"> </span><span class="nx">router</span><span class="p">.</span><span class="nx">GET</span><span class="p">(</span><span class="s">&quot;/docs&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">searchDocuments</span><span class="p">)</span> <span class="w"> </span><span class="nx">router</span><span class="p">.</span><span class="nx">GET</span><span class="p">(</span><span class="s">&quot;/docs/:id&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">getDocument</span><span class="p">)</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="s">&quot;Listening on &quot;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">port</span><span class="p">)</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="nx">http</span><span class="p">.</span><span class="nx">ListenAndServe</span><span class="p">(</span><span class="s">&quot;:&quot;</span><span class="o">+</span><span class="nx">s</span><span class="p">.</span><span class="nx">port</span><span class="p">,</span><span class="w"> </span><span class="nx">router</span><span class="p">))</span> <span class="p">}</span> </pre></div> <h3 id="using-the-index">Using the index</h3><p>When there is an equality filter we can look the equality filter up in the index database. Our filter language only supports AND-ed arguments. So the results matching the overall filter must be the set intersection of ids that match each individual equality filter. Greater than and less than filters will be filtered out after fetching all possible ids that match equality filters.</p> <p>If no ids are found in the index database meeting all equality filters then we'll fall back to the full table scan we already have.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="nx">server</span><span class="p">)</span><span class="w"> </span><span class="nx">searchDocuments</span><span class="p">(</span><span class="nx">w</span><span class="w"> </span><span class="nx">http</span><span class="p">.</span><span class="nx">ResponseWriter</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">http</span><span class="p">.</span><span class="nx">Request</span><span class="p">,</span><span class="w"> </span><span class="nx">ps</span><span class="w"> </span><span class="nx">httprouter</span><span class="p">.</span><span class="nx">Params</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">q</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parseQuery</span><span class="p">(</span><span class="nx">r</span><span class="p">.</span><span class="nx">URL</span><span class="p">.</span><span class="nx">Query</span><span class="p">().</span><span class="nx">Get</span><span class="p">(</span><span class="s">&quot;q&quot;</span><span class="p">))</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">jsonResponse</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">isRange</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="nx">idsArgumentCount</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">int</span><span class="p">{}</span> <span class="w"> </span><span class="nx">nonRangeArguments</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">argument</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">q</span><span class="p">.</span><span class="nx">ands</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">argument</span><span class="p">.</span><span class="nx">op</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;=&quot;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">nonRangeArguments</span><span class="o">++</span> <span class="w"> </span><span class="nx">ids</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">lookup</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">&quot;%s=%v&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">Join</span><span class="p">(</span><span class="nx">argument</span><span class="p">.</span><span class="nx">key</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;.&quot;</span><span class="p">),</span><span class="w"> </span><span class="nx">argument</span><span class="p">.</span><span class="nx">value</span><span class="p">))</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">jsonResponse</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">id</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">ids</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">idsArgumentCount</span><span class="p">[</span><span class="nx">id</span><span class="p">]</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">idsArgumentCount</span><span class="p">[</span><span class="nx">id</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">0</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">idsArgumentCount</span><span class="p">[</span><span class="nx">id</span><span class="p">]</span><span class="o">++</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">isRange</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">true</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">idsInAll</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">id</span><span class="p">,</span><span class="w"> </span><span class="nx">count</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">idsArgumentCount</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">count</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">nonRangeArguments</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">idsInAll</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">idsInAll</span><span class="p">,</span><span class="w"> </span><span class="nx">id</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">documents</span><span class="w"> </span><span class="p">[]</span><span class="kt">any</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">URL</span><span class="p">.</span><span class="nx">Query</span><span class="p">().</span><span class="nx">Get</span><span class="p">(</span><span class="s">&quot;skipIndex&quot;</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;true&quot;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">idsInAll</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">idsInAll</span><span class="p">)</span><span class="w"> </span><span class="p">&gt;</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">id</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">idsInAll</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">document</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">getDocumentById</span><span class="p">([]</span><span class="nb">byte</span><span class="p">(</span><span class="nx">id</span><span class="p">))</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">jsonResponse</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">isRange</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">q</span><span class="p">.</span><span class="nx">match</span><span class="p">(</span><span class="nx">document</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">documents</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">documents</span><span class="p">,</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span><span class="p">{</span> <span class="w"> </span><span class="s">&quot;id&quot;</span><span class="p">:</span><span class="w"> </span><span class="nx">id</span><span class="p">,</span> <span class="w"> </span><span class="s">&quot;body&quot;</span><span class="p">:</span><span class="w"> </span><span class="nx">document</span><span class="p">,</span> <span class="w"> </span><span class="p">})</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">iter</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">db</span><span class="p">.</span><span class="nx">NewIter</span><span class="p">(</span><span class="kc">nil</span><span class="p">)</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">iter</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">iter</span><span class="p">.</span><span class="nx">First</span><span class="p">();</span><span class="w"> </span><span class="nx">iter</span><span class="p">.</span><span class="nx">Valid</span><span class="p">();</span><span class="w"> </span><span class="nx">iter</span><span class="p">.</span><span class="nx">Next</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">document</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">json</span><span class="p">.</span><span class="nx">Unmarshal</span><span class="p">(</span><span class="nx">iter</span><span class="p">.</span><span class="nx">Value</span><span class="p">(),</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">document</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">jsonResponse</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">q</span><span class="p">.</span><span class="nx">match</span><span class="p">(</span><span class="nx">document</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">documents</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">documents</span><span class="p">,</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span><span class="p">{</span> <span class="w"> </span><span class="s">&quot;id&quot;</span><span class="p">:</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">iter</span><span class="p">.</span><span class="nx">Key</span><span class="p">()),</span> <span class="w"> </span><span class="s">&quot;body&quot;</span><span class="p">:</span><span class="w"> </span><span class="nx">document</span><span class="p">,</span> <span class="w"> </span><span class="p">})</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">jsonResponse</span><span class="p">(</span><span class="nx">w</span><span class="p">,</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">any</span><span class="p">{</span><span class="s">&quot;documents&quot;</span><span class="p">:</span><span class="w"> </span><span class="nx">documents</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;count&quot;</span><span class="p">:</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">documents</span><span class="p">)},</span><span class="w"> </span><span class="kc">nil</span><span class="p">)</span> <span class="p">}</span> </pre></div> <p>The last unimplemented part is the <code>lookup</code> helper. Given a path-value pair it checks the database for IDs that match that pair.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="nx">server</span><span class="p">)</span><span class="w"> </span><span class="nx">lookup</span><span class="p">(</span><span class="nx">pathValue</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">([]</span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">idsString</span><span class="p">,</span><span class="w"> </span><span class="nx">closer</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">indexDb</span><span class="p">.</span><span class="nx">Get</span><span class="p">([]</span><span class="nb">byte</span><span class="p">(</span><span class="nx">pathValue</span><span class="p">))</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">pebble</span><span class="p">.</span><span class="nx">ErrNotFound</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Could not look up pathvalue [%#v]: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">pathValue</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">closer</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">closer</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">idsString</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">Split</span><span class="p">(</span><span class="nb">string</span><span class="p">(</span><span class="nx">idsString</span><span class="p">),</span><span class="w"> </span><span class="s">&quot;,&quot;</span><span class="p">),</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <p>We're done. Finally! Let's build it:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>build $<span class="w"> </span>./docdb </pre></div> <p>(This is going to take a while; to reindex.)</p> <p>Once the server is ready we can run:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">time</span><span class="w"> </span>curl<span class="w"> </span>-s<span class="w"> </span>--get<span class="w"> </span>http://localhost:8080/docs<span class="w"> </span>--data-urlencode<span class="w"> </span><span class="s1">&#39;q=&quot;year&quot;:1918&#39;</span><span class="w"> </span><span class="p">|</span><span class="w"> </span>jq<span class="w"> </span><span class="s1">&#39;.body.count&#39;</span> <span class="m">1280</span> curl<span class="w"> </span>-s<span class="w"> </span>--get<span class="w"> </span>http://localhost:8080/docs<span class="w"> </span>--data-urlencode<span class="w"> </span><span class="s1">&#39;q=&quot;year&quot;:1918&#39;</span><span class="w"> </span><span class="m">0</span>.01s<span class="w"> </span>user<span class="w"> </span><span class="m">0</span>.00s<span class="w"> </span>system<span class="w"> </span><span class="m">29</span>%<span class="w"> </span>cpu<span class="w"> </span><span class="m">0</span>.029<span class="w"> </span>total </pre></div> <p>Hey that's not bad.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Hey here&#39;s a new blog post on writing a document database from scratch with support for Lucene-like queries and basic indexes in less than 500 lines of Go<a href="https://t.co/M3js6Pj9h0">https://t.co/M3js6Pj9h0</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1508546397943046150?ref_src=twsrc%5Etfw">March 28, 2022</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/documentdb.htmlMon, 28 Mar 2022 00:00:00 +0000Speeding up Go's builtin JSON encoder up to 55% for large arrays of objectshttp://notes.eatonphil.com/improving-go-json-encoding-performance-for-large-arrays-of-objects.html<head> <meta http-equiv="refresh" content="4;URL='https://datastation.multiprocess.io/blog/2022-03-03-improving-go-json-encoding-performance-for-large-arrays-of-objects.html'" /> </head><p>This is an external post of mine. Click <a href="https://datastation.multiprocess.io/blog/2022-03-03-improving-go-json-encoding-performance-for-large-arrays-of-objects.html">here</a> if you are not redirected.</p> http://notes.eatonphil.com/improving-go-json-encoding-performance-for-large-arrays-of-objects.htmlThu, 03 Mar 2022 00:00:00 +0000SMTP protocol basics from scratch in Go: receiving email from Gmailhttp://notes.eatonphil.com/handling-email-from-gmail-smtp-protocol-basics.html<p>I've never run my own mail server before. Before today I had no clue how email worked under the hood other than the very few times I've set up mail clients.</p> <p>I've heard no few times how hard it is to <em>send</em> mail from a self-hosted server (because of spam filters). But how hard can it be to hook up DNS to my personal server and receive email to my domain sent from Gmail or another real-world client?</p> <p>I knew it would be simpler to just send local mail to a local mail server with a local mail client but that didn't seem as real. If I could send email from my Gmail account and receive it in my server I'd be happy.</p> <p>I spent the afternoon digging into this. All code is <a href="https://github.com/eatonphil/gomail">available on Github</a>. The "live stream" is in the <a href="https://discord.multiprocess.io">Multiprocess Discord</a>'s &#35;hacking-networks channel.</p> <h3 id="dns">DNS</h3><p>First I bought a domain. (I needed to be able to mess around with records without blowing up anything important.)</p> <p>I knew that MX records controlled where mail for a domain is sent. But I had to <a href="https://en.wikipedia.org/wiki/MX_record">look up the specifics</a>. You need to create an MX record that points to an A or AAAA record. So you need both an MX record and an A or AAAA record.</p> <p><img src="/dnsrecords.png" alt="MX and A record settings"></p> <p>Done.</p> <h3 id="firewall">Firewall</h3><p>The firewall on Fedora is aggressive. Gotta open up port 25.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>sudo<span class="w"> </span>firewall-cmd<span class="w"> </span>--zone<span class="o">=</span>dmz<span class="w"> </span>--add-port<span class="o">=</span><span class="m">25</span>/tcp<span class="w"> </span>--permanent $<span class="w"> </span>sudo<span class="w"> </span>firewall-cmd<span class="w"> </span>--zone<span class="o">=</span>public<span class="w"> </span>--add-port<span class="o">=</span><span class="m">25</span>/tcp<span class="w"> </span>--permanent $<span class="w"> </span>sudo<span class="w"> </span>firewall-cmd<span class="w"> </span>--reload </pre></div> <p>I don't understand what zones are here.</p> <h3 id="what-protocols?">What protocols?</h3><p>I knew that you send email with SMTP and you read it with POP3 or IMAP. But it hadn't clicked before that the mail server has to speak SMTP and if you only ever read on the server (which is of course impractical in the real world) you don't need POP3 or IMAP.</p> <p><img src="https://cdn.educba.com/academy/wp-content/uploads/2019/07/smtp-protocol.png" alt="SMTP vs POP3"></p> <p>So to meaningfully receive email from Gmail all I needed to do was implement SMTP.</p> <h3 id="smtp">SMTP</h3><p>First I found the <a href="https://datatracker.ietf.org/doc/html/rfc5321">RFC for SMTP</a> (or one of them anyway) and <a href="https://en.wikipedia.org/wiki/Simple_Mail_Transfer_Protocol">the wikipedia page for it</a>.</p> <p>First off I'd need to run a TCP server on port 25.</p> <div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span> <span class="kn">import</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="s">&quot;errors&quot;</span> <span class="w"> </span><span class="s">&quot;log&quot;</span> <span class="w"> </span><span class="s">&quot;net&quot;</span> <span class="w"> </span><span class="s">&quot;strconv&quot;</span> <span class="w"> </span><span class="s">&quot;strings&quot;</span> <span class="p">)</span> <span class="kd">func</span><span class="w"> </span><span class="nx">logError</span><span class="p">(</span><span class="nx">err</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">&quot;[ERROR] %s\n&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">logInfo</span><span class="p">(</span><span class="nx">msg</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">&quot;[INFO] %s\n&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">msg</span><span class="p">)</span> <span class="p">}</span> <span class="kd">type</span><span class="w"> </span><span class="nx">message</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">clientDomain</span><span class="w"> </span><span class="kt">string</span> <span class="w"> </span><span class="nx">smtpCommands</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">string</span> <span class="w"> </span><span class="nx">atmHeaders</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="kt">string</span> <span class="w"> </span><span class="nx">body</span><span class="w"> </span><span class="kt">string</span> <span class="w"> </span><span class="nx">from</span><span class="w"> </span><span class="kt">string</span> <span class="w"> </span><span class="nx">date</span><span class="w"> </span><span class="kt">string</span> <span class="w"> </span><span class="nx">subject</span><span class="w"> </span><span class="kt">string</span> <span class="w"> </span><span class="nx">to</span><span class="w"> </span><span class="kt">string</span> <span class="p">}</span> <span class="kd">type</span><span class="w"> </span><span class="nx">connection</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">conn</span><span class="w"> </span><span class="nx">net</span><span class="p">.</span><span class="nx">Conn</span> <span class="w"> </span><span class="nx">id</span><span class="w"> </span><span class="kt">int</span> <span class="w"> </span><span class="nx">buf</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span> <span class="p">}</span> <span class="c1">// TODO</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">connection</span><span class="p">)</span><span class="w"> </span><span class="nx">handle</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// TODO</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">l</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">net</span><span class="p">.</span><span class="nx">Listen</span><span class="p">(</span><span class="s">&quot;tcp&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;0.0.0.0:25&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">l</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span> <span class="w"> </span><span class="nx">logInfo</span><span class="p">(</span><span class="s">&quot;Listening&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">id</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">conn</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">l</span><span class="p">.</span><span class="nx">Accept</span><span class="p">()</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">logError</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="k">continue</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">id</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span> <span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">connection</span><span class="p">{</span><span class="nx">conn</span><span class="p">,</span><span class="w"> </span><span class="nx">id</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">}</span> <span class="w"> </span><span class="k">go</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">handle</span><span class="p">()</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>Just a basic TCP server that passes off connections inside a goroutine.</p> <h3 id="greeting">Greeting</h3><p>After starting a connection, the server must send a greeting. The successful greeting response code is <code>220</code>. It can optionally be followed by additional text. Like most commands in SMTP it must be ended with CRLF (<code>\r\n</code>).</p> <p>So we'll add a helper function for writing lines that end in CRLF:</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">connection</span><span class="p">)</span><span class="w"> </span><span class="nx">writeLine</span><span class="p">(</span><span class="nx">msg</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">msg</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="s">&quot;\r\n&quot;</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">msg</span><span class="p">)</span><span class="w"> </span><span class="p">&gt;</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">n</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">conn</span><span class="p">.</span><span class="nx">Write</span><span class="p">([]</span><span class="nb">byte</span><span class="p">(</span><span class="nx">msg</span><span class="p">))</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">msg</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">msg</span><span class="p">[</span><span class="nx">n</span><span class="p">:]</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <p>And then we'll send that <code>220</code> in the <code>handle</code> function.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">connection</span><span class="p">)</span><span class="w"> </span><span class="nx">handle</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">conn</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">logInfo</span><span class="p">(</span><span class="s">&quot;Connection accepted&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">writeLine</span><span class="p">(</span><span class="s">&quot;220&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">logError</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">logInfo</span><span class="p">(</span><span class="s">&quot;Awaiting EHLO&quot;</span><span class="p">)</span> <span class="w"> </span><span class="c1">// TODO</span> </pre></div> <h3 id="ehlo">EHLO</h3><p>Next we need to be able to read requests from the client. We'll write a helper that reads until the next CRLF. We'll keep a buffer of unread bytes in case we accidentally get bytes past the next CRLF. We'll store that buffer in the connection object.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">connection</span><span class="p">)</span><span class="w"> </span><span class="nx">readLine</span><span class="p">()</span><span class="w"> </span><span class="p">(</span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">make</span><span class="p">([]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="mi">1024</span><span class="p">)</span> <span class="w"> </span><span class="nx">n</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">conn</span><span class="p">.</span><span class="nx">Read</span><span class="p">(</span><span class="nx">b</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="p">[:</span><span class="nx">n</span><span class="p">]</span><span class="o">...</span><span class="p">)</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// If end of line</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;\n&#39;</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p">&gt;</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="p">[</span><span class="nx">i</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;\r&#39;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// i-1 because drop the CRLF, no one cares after this</span> <span class="w"> </span><span class="nx">line</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="p">[:</span><span class="nx">i</span><span class="o">-</span><span class="mi">1</span><span class="p">])</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="p">[</span><span class="nx">i</span><span class="o">+</span><span class="mi">1</span><span class="p">:]</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">line</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>Now back in the <code>handle</code>-er we can read a line from the client. From the RFC we can see it should be <code>HELO</code> or <code>EHLO</code>. Both sendmail locally and Gmail only send <code>EHLO</code> though so we'll just check for that.</p> <p><img src="/ehloresponse.png" alt="EHLO response format"></p> <p>So we'll validate the message sent is an <code>EHLO</code> and then we'll send back a <code>250</code> with a space after it. We can ignore the rest of that response grammar since we don't have additional keywords we want to send to the client.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="p">...</span> <span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">logInfo</span><span class="p">(</span><span class="ss">&quot;Awaiting EHLO&quot;</span><span class="p">)</span> <span class="w"> </span><span class="n">line</span><span class="p">,</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="err">:</span><span class="o">=</span><span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">readLine</span><span class="p">()</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">nil</span><span class="w"> </span><span class="err">{</span> <span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">logError</span><span class="p">(</span><span class="n">err</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="err">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="err">!</span><span class="n">strings</span><span class="p">.</span><span class="n">HasPrefix</span><span class="p">(</span><span class="n">line</span><span class="p">,</span><span class="w"> </span><span class="ss">&quot;EHLO&quot;</span><span class="p">)</span><span class="w"> </span><span class="err">{</span> <span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">logError</span><span class="p">(</span><span class="n">errors</span><span class="p">.</span><span class="k">New</span><span class="p">(</span><span class="ss">&quot;Expected EHLO got: &quot;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">line</span><span class="p">))</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="err">}</span> <span class="w"> </span><span class="n">msg</span><span class="w"> </span><span class="err">:</span><span class="o">=</span><span class="w"> </span><span class="n">message</span><span class="err">{</span> <span class="w"> </span><span class="nl">smtpCommands</span><span class="p">:</span><span class="w"> </span><span class="k">map</span><span class="o">[</span><span class="n">string</span><span class="o">]</span><span class="n">string</span><span class="err">{}</span><span class="p">,</span> <span class="w"> </span><span class="nl">atmHeaders</span><span class="p">:</span><span class="w"> </span><span class="k">map</span><span class="o">[</span><span class="n">string</span><span class="o">]</span><span class="n">string</span><span class="err">{}</span><span class="p">,</span> <span class="w"> </span><span class="err">}</span> <span class="w"> </span><span class="n">msg</span><span class="p">.</span><span class="n">clientDomain</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">line</span><span class="o">[</span><span class="n">len(&quot;EHLO &quot;):</span><span class="o">]</span> <span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">logInfo</span><span class="p">(</span><span class="ss">&quot;Received EHLO&quot;</span><span class="p">)</span> <span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">writeLine</span><span class="p">(</span><span class="ss">&quot;250 &quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">nil</span><span class="w"> </span><span class="err">{</span> <span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">logError</span><span class="p">(</span><span class="n">err</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="err">}</span> <span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">logInfo</span><span class="p">(</span><span class="ss">&quot;Done EHLO&quot;</span><span class="p">)</span> <span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">TODO</span> </pre></div> <h3 id="additional-commands">Additional commands</h3><p>Next up there are a few commands we need to read before we get to the message body. These include the recipient and the sender address. These are formatted vaguely similar to HTTP headers. They have a key on the left side of a colon and a value on the right. They may have a required order too, I'm not sure.</p> <p>In response to the commands we'll send a <code>250 OK</code>, although I'm not sure where in the RFC that is suggested.</p> <p>In our code we'll just keep looking for these commands until we find the <code>DATA</code> command. This indicates the body is to follow. And to this command we respond with a <code>354</code> instead of a <code>250 OK</code>.</p> <p><img src="/dataresponse.png" alt="DATA response"></p> <p>In code:</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="o">...</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">logInfo</span><span class="p">(</span><span class="s">&quot;Done EHLO&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">line</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">line</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">readLine</span><span class="p">()</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">logError</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">pieces</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">SplitN</span><span class="p">(</span><span class="nx">line</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;:&quot;</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span><span class="p">)</span> <span class="w"> </span><span class="nx">smtpCommand</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">ToUpper</span><span class="p">(</span><span class="nx">pieces</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span> <span class="w"> </span><span class="c1">// Special command without a value</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">smtpCommand</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;DATA&quot;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">writeLine</span><span class="p">(</span><span class="s">&quot;354&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">logError</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">break</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">smtpValue</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">pieces</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="w"> </span><span class="nx">msg</span><span class="p">.</span><span class="nx">smtpCommands</span><span class="p">[</span><span class="nx">smtpCommand</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">smtpValue</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">logInfo</span><span class="p">(</span><span class="s">&quot;Got command: &quot;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">line</span><span class="p">)</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">writeLine</span><span class="p">(</span><span class="s">&quot;250 OK&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">logError</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">logInfo</span><span class="p">(</span><span class="s">&quot;Done SMTP commands, reading ARPA text message headers&quot;</span><span class="p">)</span> <span class="w"> </span><span class="c1">// TODO</span> </pre></div> <h3 id="message-body,-headers">Message body, headers</h3><p>Now that we've seen the <code>DATA</code> command we are within <em>a</em> message body. Within this body we still have to read some additional headers.</p> <p>Through trial-and-error I know to look for some headers like <code>Subject</code>. By searching the RFC I noticed a reference to <a href="https://datatracker.ietf.org/doc/html/rfc822">RFC 822</a> where these headers are defined.</p> <p><img src="/subject.png" alt="ARPA text message headers"></p> <p>These are ARPA internet text message headers apparently. They also look like HTTP headers but unlike HTTP headers they can span multiple lines. This stumped me for a bit.</p> <p><img src="/longheaders.png" alt="Multi-line headers"></p> <p>I decided to write a new <code>readLine</code> function that would specifically look for these possibly multi-line headers where a CRLF followed by whitespace isn't a line delimiter.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">connection</span><span class="p">)</span><span class="w"> </span><span class="nx">readMultiLine</span><span class="p">()</span><span class="w"> </span><span class="p">(</span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">noMoreReads</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p">&gt;</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">&amp;&amp;</span> <span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="sc">&#39; &#39;</span><span class="w"> </span><span class="o">&amp;&amp;</span> <span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="sc">&#39;\t&#39;</span><span class="w"> </span><span class="o">&amp;&amp;</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="p">[</span><span class="nx">i</span><span class="o">-</span><span class="mi">2</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;\r&#39;</span><span class="w"> </span><span class="o">&amp;&amp;</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="p">[</span><span class="nx">i</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;\n&#39;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// i-2 because drop the CRLF, no one cares after this</span> <span class="w"> </span><span class="nx">line</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="p">[:</span><span class="nx">i</span><span class="o">-</span><span class="mi">2</span><span class="p">])</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="p">[</span><span class="nx">i</span><span class="p">:]</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">line</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">noMoreReads</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">isBodyClose</span><span class="p">(</span><span class="nx">i</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">noMoreReads</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">make</span><span class="p">([]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="mi">1024</span><span class="p">)</span> <span class="w"> </span><span class="nx">n</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">conn</span><span class="p">.</span><span class="nx">Read</span><span class="p">(</span><span class="nx">b</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="p">[:</span><span class="nx">n</span><span class="p">]</span><span class="o">...</span><span class="p">)</span> <span class="w"> </span><span class="c1">// If this gets here more than once it&#39;s going to be an infinite loop</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">connection</span><span class="p">)</span><span class="w"> </span><span class="nx">isBodyClose</span><span class="p">(</span><span class="nx">i</span><span class="w"> </span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p">&gt;</span><span class="w"> </span><span class="mi">4</span><span class="w"> </span><span class="o">&amp;&amp;</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="p">[</span><span class="nx">i</span><span class="o">-</span><span class="mi">4</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;\r&#39;</span><span class="w"> </span><span class="o">&amp;&amp;</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="p">[</span><span class="nx">i</span><span class="o">-</span><span class="mi">3</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;\n&#39;</span><span class="w"> </span><span class="o">&amp;&amp;</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="p">[</span><span class="nx">i</span><span class="o">-</span><span class="mi">2</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;.&#39;</span><span class="w"> </span><span class="o">&amp;&amp;</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="p">[</span><span class="nx">i</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;\r&#39;</span><span class="w"> </span><span class="o">&amp;&amp;</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="p">[</span><span class="nx">i</span><span class="o">-</span><span class="mi">0</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;\n&#39;</span> <span class="p">}</span> </pre></div> <p>Now back in the <code>handle</code> function we can read through all of these headers. According to RFC 822, we're done when we see a double CRLF, which in our code will show up as an empty line.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="o">...</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">logInfo</span><span class="p">(</span><span class="s">&quot;Done SMTP headers, reading ARPA text message headers&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">line</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">readMultiLine</span><span class="p">()</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">logError</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">TrimSpace</span><span class="p">(</span><span class="nx">line</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">break</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">pieces</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">SplitN</span><span class="p">(</span><span class="nx">line</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;: &quot;</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span><span class="p">)</span> <span class="w"> </span><span class="nx">atmHeader</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">ToUpper</span><span class="p">(</span><span class="nx">pieces</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span> <span class="w"> </span><span class="nx">atmValue</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">pieces</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="w"> </span><span class="nx">msg</span><span class="p">.</span><span class="nx">atmHeaders</span><span class="p">[</span><span class="nx">atmHeader</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">atmValue</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">atmHeader</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;SUBJECT&quot;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">msg</span><span class="p">.</span><span class="nx">subject</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">atmValue</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">atmHeader</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;TO&quot;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">msg</span><span class="p">.</span><span class="nx">to</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">atmValue</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">atmHeader</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;FROM&quot;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">msg</span><span class="p">.</span><span class="nx">from</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">atmValue</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">atmHeader</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;DATE&quot;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">msg</span><span class="p">.</span><span class="nx">date</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">atmValue</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">logInfo</span><span class="p">(</span><span class="s">&quot;Done ARPA text message headers, reading body&quot;</span><span class="p">)</span> <span class="w"> </span><span class="c1">// TODO</span> </pre></div> <h3 id="body,-for-real">Body, for real</h3><p>We're finally at the email body as the user typed it. According to the SMTP RFC the body ends with a CRLF followed by a dot (period) followed by a CRLF.</p> <p>So we'll write another helper to read until this marker.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">connection</span><span class="p">)</span><span class="w"> </span><span class="nx">readToEndOfBody</span><span class="p">()</span><span class="w"> </span><span class="p">(</span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">isBodyClose</span><span class="p">(</span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="p">[:</span><span class="nx">i</span><span class="o">-</span><span class="mi">4</span><span class="p">]),</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">make</span><span class="p">([]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="mi">1024</span><span class="p">)</span> <span class="w"> </span><span class="nx">n</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">conn</span><span class="p">.</span><span class="nx">Read</span><span class="p">(</span><span class="nx">b</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">buf</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="p">[:</span><span class="nx">n</span><span class="p">]</span><span class="o">...</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>And we can finish up the <code>handle</code> function.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">logInfo</span><span class="p">(</span><span class="s">&quot;Done ARPA text message headers, reading body&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">msg</span><span class="p">.</span><span class="nx">body</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">readToEndOfBody</span><span class="p">()</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">logError</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">logInfo</span><span class="p">(</span><span class="s">&quot;Got body (%d bytes)&quot;</span><span class="p">,</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">msg</span><span class="p">.</span><span class="nx">body</span><span class="p">))</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">writeLine</span><span class="p">(</span><span class="s">&quot;250 OK&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">logError</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">logInfo</span><span class="p">(</span><span class="s">&quot;Message:\n%s\n&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">msg</span><span class="p">.</span><span class="nx">body</span><span class="p">)</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">logInfo</span><span class="p">(</span><span class="s">&quot;Connection closed&quot;</span><span class="p">)</span> <span class="p">}</span> </pre></div> <h3 id="compile,-setcap,-run,-and-send">Compile, setcap, run, and send</h3><div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>build $<span class="w"> </span>sudo<span class="w"> </span>setcap<span class="w"> </span><span class="s1">&#39;cap_net_bind_service=+ep&#39;</span><span class="w"> </span>./gomail $<span class="w"> </span>./gomail </pre></div> <p>And send an email in Gmail! It can be to any user since we haven't implemented anything regarding users. I'll send <code>What hath god wrought</code> as the subject and message to <code>[email protected]</code>.</p> <p>And I see:</p> <div class="highlight"><pre><span></span><span class="m">2022</span>/02/21<span class="w"> </span><span class="m">02</span>:17:19<span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>Listening <span class="m">2022</span>/02/21<span class="w"> </span><span class="m">02</span>:19:13<span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="m">1</span>:<span class="w"> </span><span class="m">209</span>.85.222.47:40695<span class="o">]</span><span class="w"> </span>Connection<span class="w"> </span>accepted <span class="m">2022</span>/02/21<span class="w"> </span><span class="m">02</span>:19:13<span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="m">1</span>:<span class="w"> </span><span class="m">209</span>.85.222.47:40695<span class="o">]</span><span class="w"> </span>Awaiting<span class="w"> </span>EHLO <span class="m">2022</span>/02/21<span class="w"> </span><span class="m">02</span>:19:13<span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="m">1</span>:<span class="w"> </span><span class="m">209</span>.85.222.47:40695<span class="o">]</span><span class="w"> </span>Received<span class="w"> </span>EHLO <span class="m">2022</span>/02/21<span class="w"> </span><span class="m">02</span>:19:13<span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="m">1</span>:<span class="w"> </span><span class="m">209</span>.85.222.47:40695<span class="o">]</span><span class="w"> </span>Done<span class="w"> </span>EHLO <span class="m">2022</span>/02/21<span class="w"> </span><span class="m">02</span>:19:13<span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="m">1</span>:<span class="w"> </span><span class="m">209</span>.85.222.47:40695<span class="o">]</span><span class="w"> </span>Got<span class="w"> </span>header:<span class="w"> </span>MAIL<span class="w"> </span>FROM:&lt;[email protected]&gt; <span class="m">2022</span>/02/21<span class="w"> </span><span class="m">02</span>:19:13<span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="m">1</span>:<span class="w"> </span><span class="m">209</span>.85.222.47:40695<span class="o">]</span><span class="w"> </span>Got<span class="w"> </span>header:<span class="w"> </span>RCPT<span class="w"> </span>TO:&lt;[email protected]&gt; <span class="m">2022</span>/02/21<span class="w"> </span><span class="m">02</span>:19:13<span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="m">1</span>:<span class="w"> </span><span class="m">209</span>.85.222.47:40695<span class="o">]</span><span class="w"> </span>Done<span class="w"> </span>SMTP<span class="w"> </span>headers,<span class="w"> </span>reading<span class="w"> </span>ARPA<span class="w"> </span>text<span class="w"> </span>message<span class="w"> </span>headers <span class="m">2022</span>/02/21<span class="w"> </span><span class="m">02</span>:19:13<span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="m">1</span>:<span class="w"> </span><span class="m">209</span>.85.222.47:40695<span class="o">]</span><span class="w"> </span>Done<span class="w"> </span>ARPA<span class="w"> </span>text<span class="w"> </span>message<span class="w"> </span>headers,<span class="w"> </span>reading<span class="w"> </span>body <span class="m">2022</span>/02/21<span class="w"> </span><span class="m">02</span>:19:13<span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="m">1</span>:<span class="w"> </span><span class="m">209</span>.85.222.47:40695<span class="o">]</span><span class="w"> </span>Got<span class="w"> </span>body<span class="w"> </span><span class="o">(</span><span class="m">256</span><span class="w"> </span>bytes<span class="o">)</span> <span class="m">2022</span>/02/21<span class="w"> </span><span class="m">02</span>:19:13<span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="m">1</span>:<span class="w"> </span><span class="m">209</span>.85.222.47:40695<span class="o">]</span><span class="w"> </span>Message: --000000000000c4758905d87ddb81 Content-Type:<span class="w"> </span>text/plain<span class="p">;</span><span class="w"> </span><span class="nv">charset</span><span class="o">=</span><span class="s2">&quot;UTF-8&quot;</span> What<span class="w"> </span>hath<span class="w"> </span>god<span class="w"> </span>wrought --000000000000c4758905d87ddb81 Content-Type:<span class="w"> </span>text/html<span class="p">;</span><span class="w"> </span><span class="nv">charset</span><span class="o">=</span><span class="s2">&quot;UTF-8&quot;</span> &lt;div<span class="w"> </span><span class="nv">dir</span><span class="o">=</span><span class="s2">&quot;ltr&quot;</span>&gt;What<span class="w"> </span>hath<span class="w"> </span>god<span class="w"> </span>wrought&lt;/div&gt; --000000000000c4758905d87ddb81-- <span class="m">2022</span>/02/21<span class="w"> </span><span class="m">02</span>:19:13<span class="w"> </span><span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="m">1</span>:<span class="w"> </span><span class="m">209</span>.85.222.47:40695<span class="o">]</span><span class="w"> </span>Connection<span class="w"> </span>closed </pre></div> <p>Which is pretty sweet!</p> <h3 id="multipart-wut">Multipart wut</h3><p>Ok this body still clearly has some format. And if we dump the ARPA text message headers we notice that Gmail 1) sets a Content-Type header and 2) it's value is <code>multipart/alternative</code>. I don't know where Content-Type as a valid header is defined because it's not in RFC 822. Maybe it's some "new-fangled" adhoc standard or maybe it's just in an extension RFC.</p> <p>In any case this looks like multipart bodies in HTTP. I don't want to deal with that so I'm just going to stop here.</p> <p>But I <em>am</em> curious about text-only email systems. So I <code>sudo dnf install php sendmail</code> and write a quick PHP script (thanks to @Josh on Discord for the suggestion):</p> <div class="highlight"><pre><span></span><span class="cp">&lt;?php</span> <span class="nb">mail</span><span class="p">(</span><span class="s2">&quot;[email protected]&quot;</span><span class="p">,</span> <span class="s2">&quot;What hath god wrought&quot;</span><span class="p">,</span> <span class="s2">&quot;What hath god wrought&quot;</span><span class="p">,</span> <span class="s2">&quot;&quot;</span><span class="p">);</span> <span class="cp">?&gt;</span> </pre></div> <p>And run it:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>php<span class="w"> </span>test.php </pre></div> <p>And in my <code>gomail</code> window I see:</p> <div class="highlight"><pre><span></span><span class="mi">2022</span><span class="o">/</span><span class="mi">02</span><span class="o">/</span><span class="mi">21</span><span class="w"> </span><span class="mi">02</span><span class="err">:</span><span class="mi">24</span><span class="err">:</span><span class="mi">17</span><span class="w"> </span><span class="o">[</span><span class="n">INFO</span><span class="o">]</span><span class="w"> </span><span class="n">Listening</span> <span class="mi">2022</span><span class="o">/</span><span class="mi">02</span><span class="o">/</span><span class="mi">21</span><span class="w"> </span><span class="mi">02</span><span class="err">:</span><span class="mi">24</span><span class="err">:</span><span class="mi">18</span><span class="w"> </span><span class="o">[</span><span class="n">INFO</span><span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="n">1: 127.0.0.1:45102</span><span class="o">]</span><span class="w"> </span><span class="k">Connection</span><span class="w"> </span><span class="n">accepted</span> <span class="mi">2022</span><span class="o">/</span><span class="mi">02</span><span class="o">/</span><span class="mi">21</span><span class="w"> </span><span class="mi">02</span><span class="err">:</span><span class="mi">24</span><span class="err">:</span><span class="mi">18</span><span class="w"> </span><span class="o">[</span><span class="n">INFO</span><span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="n">1: 127.0.0.1:45102</span><span class="o">]</span><span class="w"> </span><span class="n">Awaiting</span><span class="w"> </span><span class="n">EHLO</span> <span class="mi">2022</span><span class="o">/</span><span class="mi">02</span><span class="o">/</span><span class="mi">21</span><span class="w"> </span><span class="mi">02</span><span class="err">:</span><span class="mi">24</span><span class="err">:</span><span class="mi">18</span><span class="w"> </span><span class="o">[</span><span class="n">INFO</span><span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="n">1: 127.0.0.1:45102</span><span class="o">]</span><span class="w"> </span><span class="n">Received</span><span class="w"> </span><span class="n">EHLO</span> <span class="mi">2022</span><span class="o">/</span><span class="mi">02</span><span class="o">/</span><span class="mi">21</span><span class="w"> </span><span class="mi">02</span><span class="err">:</span><span class="mi">24</span><span class="err">:</span><span class="mi">18</span><span class="w"> </span><span class="o">[</span><span class="n">INFO</span><span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="n">1: 127.0.0.1:45102</span><span class="o">]</span><span class="w"> </span><span class="n">Done</span><span class="w"> </span><span class="n">EHLO</span> <span class="mi">2022</span><span class="o">/</span><span class="mi">02</span><span class="o">/</span><span class="mi">21</span><span class="w"> </span><span class="mi">02</span><span class="err">:</span><span class="mi">24</span><span class="err">:</span><span class="mi">18</span><span class="w"> </span><span class="o">[</span><span class="n">INFO</span><span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="n">1: 127.0.0.1:45102</span><span class="o">]</span><span class="w"> </span><span class="n">Got</span><span class="w"> </span><span class="nl">header</span><span class="p">:</span><span class="w"> </span><span class="n">MAIL</span><span class="w"> </span><span class="k">From</span><span class="err">:</span><span class="o">&lt;</span><span class="n">phil</span><span class="nv">@dev1</span><span class="p">.</span><span class="n">eatonphil</span><span class="p">.</span><span class="n">com</span><span class="o">&gt;</span> <span class="mi">2022</span><span class="o">/</span><span class="mi">02</span><span class="o">/</span><span class="mi">21</span><span class="w"> </span><span class="mi">02</span><span class="err">:</span><span class="mi">24</span><span class="err">:</span><span class="mi">18</span><span class="w"> </span><span class="o">[</span><span class="n">INFO</span><span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="n">1: 127.0.0.1:45102</span><span class="o">]</span><span class="w"> </span><span class="n">Got</span><span class="w"> </span><span class="nl">header</span><span class="p">:</span><span class="w"> </span><span class="n">RCPT</span><span class="w"> </span><span class="k">To</span><span class="err">:</span><span class="o">&lt;</span><span class="n">morse</span><span class="nv">@binutils</span><span class="p">.</span><span class="n">org</span><span class="o">&gt;</span> <span class="mi">2022</span><span class="o">/</span><span class="mi">02</span><span class="o">/</span><span class="mi">21</span><span class="w"> </span><span class="mi">02</span><span class="err">:</span><span class="mi">24</span><span class="err">:</span><span class="mi">18</span><span class="w"> </span><span class="o">[</span><span class="n">INFO</span><span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="n">1: 127.0.0.1:45102</span><span class="o">]</span><span class="w"> </span><span class="n">Done</span><span class="w"> </span><span class="n">SMTP</span><span class="w"> </span><span class="n">headers</span><span class="p">,</span><span class="w"> </span><span class="n">reading</span><span class="w"> </span><span class="n">ARPA</span><span class="w"> </span><span class="nc">text</span><span class="w"> </span><span class="n">message</span><span class="w"> </span><span class="n">headers</span> <span class="mi">2022</span><span class="o">/</span><span class="mi">02</span><span class="o">/</span><span class="mi">21</span><span class="w"> </span><span class="mi">02</span><span class="err">:</span><span class="mi">24</span><span class="err">:</span><span class="mi">18</span><span class="w"> </span><span class="o">[</span><span class="n">INFO</span><span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="n">1: 127.0.0.1:45102</span><span class="o">]</span><span class="w"> </span><span class="n">Done</span><span class="w"> </span><span class="n">ARPA</span><span class="w"> </span><span class="nc">text</span><span class="w"> </span><span class="n">message</span><span class="w"> </span><span class="n">headers</span><span class="p">,</span><span class="w"> </span><span class="n">reading</span><span class="w"> </span><span class="n">body</span> <span class="mi">2022</span><span class="o">/</span><span class="mi">02</span><span class="o">/</span><span class="mi">21</span><span class="w"> </span><span class="mi">02</span><span class="err">:</span><span class="mi">24</span><span class="err">:</span><span class="mi">18</span><span class="w"> </span><span class="o">[</span><span class="n">INFO</span><span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="n">1: 127.0.0.1:45102</span><span class="o">]</span><span class="w"> </span><span class="n">Got</span><span class="w"> </span><span class="n">body</span><span class="w"> </span><span class="p">(</span><span class="mi">21</span><span class="w"> </span><span class="n">bytes</span><span class="p">)</span> <span class="mi">2022</span><span class="o">/</span><span class="mi">02</span><span class="o">/</span><span class="mi">21</span><span class="w"> </span><span class="mi">02</span><span class="err">:</span><span class="mi">24</span><span class="err">:</span><span class="mi">18</span><span class="w"> </span><span class="o">[</span><span class="n">INFO</span><span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="n">1: 127.0.0.1:45102</span><span class="o">]</span><span class="w"> </span><span class="nl">Message</span><span class="p">:</span> <span class="n">What</span><span class="w"> </span><span class="n">hath</span><span class="w"> </span><span class="n">god</span><span class="w"> </span><span class="n">wrought</span> <span class="mi">2022</span><span class="o">/</span><span class="mi">02</span><span class="o">/</span><span class="mi">21</span><span class="w"> </span><span class="mi">02</span><span class="err">:</span><span class="mi">24</span><span class="err">:</span><span class="mi">18</span><span class="w"> </span><span class="o">[</span><span class="n">INFO</span><span class="o">]</span><span class="w"> </span><span class="o">[</span><span class="n">1: 127.0.0.1:45102</span><span class="o">]</span><span class="w"> </span><span class="k">Connection</span><span class="w"> </span><span class="n">closed</span> </pre></div> <p>And I'm happy to call it a night.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I wrote a new blog post on building an SMTP server from scratch in Go that is correctly enough hooked up you can receive emails sent from Gmail to it!<br><br>Good fun and some learning too.<a href="https://t.co/8pYkkAbFnI">https://t.co/8pYkkAbFnI</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1495586245896028160?ref_src=twsrc%5Etfw">February 21, 2022</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> <p>p.s. if you want to see more networking software/hardware internals check out <a href="https://reddit.com/r/networkdevelopment">/r/NetworkDevelopment</a>.</p> http://notes.eatonphil.com/handling-email-from-gmail-smtp-protocol-basics.htmlSun, 20 Feb 2022 00:00:00 +0000The world of PostgreSQL wire compatibilityhttp://notes.eatonphil.com/the-world-of-postgresql-wire-compatibility.html<head> <meta http-equiv="refresh" content="4;URL='https://datastation.multiprocess.io/blog/2022-02-08-the-world-of-postgresql-wire-compatibility.html'" /> </head><p>This is an external post of mine. Click <a href="https://datastation.multiprocess.io/blog/2022-02-08-the-world-of-postgresql-wire-compatibility.html">here</a> if you are not redirected.</p> http://notes.eatonphil.com/the-world-of-postgresql-wire-compatibility.htmlTue, 08 Feb 2022 00:00:00 +0000How to recommend books, or, stop recommending SICPhttp://notes.eatonphil.com/recommending-a-book.html<p>Many "must-read" books are not well-written. I <a href="https://www.goodreads.com/user/show/50930981-phil-eaton">try to read a lot</a>, but I still have a low tolerance for bad writing and bad editing. I write this post both to discourage thoughtless recommendations and to encourage the receivers of bad recommendations.</p> <p>For software developers, Structure and Interpretation of Computer Programs is a prime example. Written for freshman at MIT, it is ostensibly an entry-level text. But it requires such a level of competence in math and physics, and the prose itself is so dense and archaic, that I couldn't imagine suggesting it to anyone.</p> <p>And yet it is one of the most recommended books for developers.</p> <p>This is not to say that SICP is a bad book or that you should not read it. I just don't think it should ever be suggested to anyone.</p> <h3 id="goal">Goal</h3><p>The core goal of a book recommendation is for the reader to get enjoyment or education from it. If you can't continue or finish a book, you get nothing from it.</p> <p>You, the recommender, diminish your impact if you can only recommend books that people won't continue or finish.</p> <h4 id="non-goal">Non-goal</h4><p>Some people have the capacity to read and love challenging books. If that is you, you are not the audience of this post. I don't think you'd disagree that most people are not like you.</p> <h3 id="why">Why</h3><p>I have a few, not-mutually-exclusive guesses why "must-read" books are often poorly written.</p> <p>One guess is intelligence signalling. That it is human nature for a person to suggest a book in an attempt make herself look smart rather than to best assist the person asking for a suggestion.</p> <p>Another guess is that most people don't read enough to have a good feel for better or worse writing and editing.</p> <p>And a final guess is that books that are worth reading might not always be well-written. This is the most unfortunate guess of all. I don't disagree that sometimes it is necessary to learn from poorly-written books. But I begrudge this because of how much joy I get from reading well-written books, fiction and non-fiction.</p> <p>I have a feeling my guesses apply to recommendations in general: music, art, film, musicals, restaurants, etc.</p> <h3 id="instead">Instead</h3><p>My suggestion then to folks who are in the position of giving recommendations:</p> <ol> <li>If you had a hard time reading a book or it took you too long to read it (yes, this threshold is different for everyone), don't recommend it</li> <li>Don't be scared to recommend nothing, or to recommend against (rather than for) a certain book</li> <li>Read more books</li> </ol> <p>And definitely don't recommend books you haven't read.</p> <h3 id="mea-culpa">Mea culpa</h3><p>I've definitely done a bad job recommending books in the past, including recommending books I haven't read. I've been trying to do better in the last 5 years or so.</p> <p>What do you think?</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I wrote a new blog post: a bit of flame bait on how to recommend books and why so many must-read books are impossible to read.<br><br>Or: stop recommending SICP.<br><br>If you love challenging books, you are neither the norm nor the audience of this post. 😀<a href="https://t.co/ZU92kgr4Kf">https://t.co/ZU92kgr4Kf</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1488204810541219840?ref_src=twsrc%5Etfw">January 31, 2022</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/recommending-a-book.htmlMon, 31 Jan 2022 00:00:00 +0000Bootloader basicshttp://notes.eatonphil.com/bootloader-basics.html<p>I spent a few days playing around with bootloaders for the first time. This post builds up to a text editor with a few keyboard shortcuts. I'll be giving a virtual talk based on this work at <a href="https://www.meetup.com/hackernights/">Hacker Nights on Jan 27</a>.</p> <p>There are a definitely bugs. But it's hard to find intermediate resources for bootloader programming so maybe parts of this will be useful.</p> <p>If you already know the basics and the intermediates and just want a fantastic intermediate+ tutorial, maybe try <a href="https://0x00sec.org/t/realmode-assembly-writing-bootable-stuff-part-5/3667">this</a>. It is very good.</p> <p>The code on this post is available on <a href="https://github.com/eatonphil/bootloaders">Github</a>, but it's more of a mess than my usual project.</p> <h3 id="motivation:-snake">Motivation: Snake</h3><p>You remember <a href="https://www.quaxio.com/bootloader_retro_game_tweet/">snake bootloader in a tweet</a> from a few years ago?</p> <p>Install qemu (on macOS or Linux), nasm, and copy the <code>snake.asm</code> source code to disk from that blog post.</p> <div class="highlight"><pre><span></span><span class="nf">$</span><span class="w"> </span><span class="nv">cat</span><span class="w"> </span><span class="nv">snake.asm</span> <span class="w"> </span><span class="err">[</span><span class="k">bits</span><span class="w"> </span><span class="mi">16</span><span class="p">]</span><span class="w"> </span><span class="c1">; Pragma, tells the assembler that we</span> <span class="w"> </span><span class="c1">; are in 16 bit mode (which is the state</span> <span class="w"> </span><span class="c1">; of x86 when booting from a floppy).</span> <span class="w"> </span><span class="err">[</span><span class="k">org</span><span class="w"> </span><span class="mh">0x7C00</span><span class="p">]</span><span class="w"> </span><span class="c1">; Pragma, tell the assembler where the</span> <span class="w"> </span><span class="c1">; code will be loaded.</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">bl</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="c1">; Starting direction for the worm.</span> <span class="w"> </span><span class="nf">push</span><span class="w"> </span><span class="mh">0xa000</span><span class="w"> </span><span class="c1">; Load address of VRAM into es.</span> <span class="w"> </span><span class="nf">pop</span><span class="w"> </span><span class="nb">es</span> <span class="nl">restart_game:</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">si</span><span class="p">,</span><span class="w"> </span><span class="mi">320</span><span class="o">*</span><span class="mi">100</span><span class="o">+</span><span class="mi">160</span><span class="w"> </span><span class="c1">; worm&#39;s starting position, center of</span> <span class="w"> </span><span class="c1">; screen</span> <span class="w"> </span><span class="c1">; Set video mode. Mode 13h is VGA (1 byte per pixel with the actual</span> <span class="w"> </span><span class="c1">; color stored in a palette), 320x200 total size. When restarting,</span> <span class="w"> </span><span class="c1">; this also clears the screen.</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">ax</span><span class="p">,</span><span class="w"> </span><span class="mh">0x0013</span> <span class="w"> </span><span class="nf">int</span><span class="w"> </span><span class="mh">0x10</span> <span class="w"> </span><span class="c1">; Draw borders. We assume the default palette will work for us.</span> <span class="w"> </span><span class="c1">; We also assume that starting at the bottom and drawing 2176 pixels</span> <span class="w"> </span><span class="c1">; wraps around and ends up drawing the top + bottom borders.</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">di</span><span class="p">,</span><span class="w"> </span><span class="mi">320</span><span class="o">*</span><span class="mi">199</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">cx</span><span class="p">,</span><span class="w"> </span><span class="mi">2176</span> <span class="w"> </span><span class="nf">rep</span> <span class="nl">draw_loop:</span> <span class="w"> </span><span class="nf">stosb</span><span class="w"> </span><span class="c1">; draw right border</span> <span class="w"> </span><span class="nf">stosb</span><span class="w"> </span><span class="c1">; draw left border</span> <span class="w"> </span><span class="nf">add</span><span class="w"> </span><span class="nb">di</span><span class="p">,</span><span class="w"> </span><span class="mi">318</span> <span class="w"> </span><span class="nf">jnc</span><span class="w"> </span><span class="nv">draw_loop</span><span class="w"> </span><span class="c1">; notice the jump in the middle of the</span> <span class="w"> </span><span class="c1">; rep stosb instruction.</span> <span class="nl">game_loop:</span> <span class="w"> </span><span class="c1">; We read the keyboard input from port 0x60. This also reads bytes from</span> <span class="w"> </span><span class="c1">; the mouse, so we need to only handle [up (0x48), left (0x4b),</span> <span class="w"> </span><span class="c1">; right (0x4d), down (0x50)]</span> <span class="w"> </span><span class="nf">in</span><span class="w"> </span><span class="nb">al</span><span class="p">,</span><span class="w"> </span><span class="mh">0x60</span> <span class="w"> </span><span class="nf">cmp</span><span class="w"> </span><span class="nb">al</span><span class="p">,</span><span class="w"> </span><span class="mh">0x48</span> <span class="w"> </span><span class="nf">jb</span><span class="w"> </span><span class="nv">kb_handle_end</span> <span class="w"> </span><span class="nf">cmp</span><span class="w"> </span><span class="nb">al</span><span class="p">,</span><span class="w"> </span><span class="mh">0x50</span> <span class="w"> </span><span class="nf">ja</span><span class="w"> </span><span class="nv">kb_handle_end</span> <span class="w"> </span><span class="c1">; At the end bx contains offset displacement (+1, -1, +320, -320)</span> <span class="w"> </span><span class="c1">; based on pressed/released keypad key. I bet there are a few bytes</span> <span class="w"> </span><span class="c1">; to shave around here given the bounds check above.</span> <span class="w"> </span><span class="nf">aaa</span> <span class="w"> </span><span class="nf">cbw</span> <span class="w"> </span><span class="nf">dec</span><span class="w"> </span><span class="nb">ax</span> <span class="w"> </span><span class="nf">dec</span><span class="w"> </span><span class="nb">ax</span> <span class="w"> </span><span class="nf">jc</span><span class="w"> </span><span class="nv">kb_handle</span> <span class="w"> </span><span class="nf">sub</span><span class="w"> </span><span class="nb">al</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span> <span class="w"> </span><span class="nf">imul</span><span class="w"> </span><span class="nb">ax</span><span class="p">,</span><span class="w"> </span><span class="nb">ax</span><span class="p">,</span><span class="w"> </span><span class="kt">byte</span><span class="w"> </span><span class="o">-</span><span class="mh">0x50</span> <span class="nl">kb_handle:</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">bx</span><span class="p">,</span><span class="w"> </span><span class="nb">ax</span> <span class="nl">kb_handle_end:</span> <span class="w"> </span><span class="nf">add</span><span class="w"> </span><span class="nb">si</span><span class="p">,</span><span class="w"> </span><span class="nb">bx</span> <span class="w"> </span><span class="c1">; The original code used set pallete command (10h/0bh) to wait for</span> <span class="w"> </span><span class="c1">; the vertical retrace. Today&#39;s computers are however too fast, so</span> <span class="w"> </span><span class="c1">; we use int 15h 86h instead. This also shaves a few bytes.</span> <span class="w"> </span><span class="c1">; Note: you&#39;ll have to tweak cx+dx if you are running this on a virtual</span> <span class="w"> </span><span class="c1">; machine vs real hardware. Casual testing seems to show that virtual machines</span> <span class="w"> </span><span class="c1">; wait ~3-4x longer than physical hardware.</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="w"> </span><span class="mh">0x86</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">dh</span><span class="p">,</span><span class="w"> </span><span class="mh">0xef</span> <span class="w"> </span><span class="nf">int</span><span class="w"> </span><span class="mh">0x15</span> <span class="w"> </span><span class="c1">; Draw worm and check for collision with parity</span> <span class="w"> </span><span class="c1">; (even parity=collision).</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="w"> </span><span class="mh">0x45</span> <span class="w"> </span><span class="nf">xor</span><span class="w"> </span><span class="p">[</span><span class="nb">es</span><span class="p">:</span><span class="nb">si</span><span class="p">],</span><span class="w"> </span><span class="nb">ah</span> <span class="w"> </span><span class="c1">; Go back to the main game loop.</span> <span class="w"> </span><span class="nf">jpo</span><span class="w"> </span><span class="nv">game_loop</span> <span class="w"> </span><span class="c1">; We hit a wall or the worm. Restart the game.</span> <span class="w"> </span><span class="nf">jmp</span><span class="w"> </span><span class="nv">restart_game</span> <span class="kd">TIMES</span><span class="w"> </span><span class="mi">510</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="p">(</span><span class="kc">$</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="kc">$$</span><span class="p">)</span><span class="w"> </span><span class="nv">db</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="c1">; Fill the rest of sector with 0</span> <span class="kd">dw</span><span class="w"> </span><span class="mh">0xaa55</span><span class="w"> </span><span class="c1">; Boot signature at the end of bootloader</span> </pre></div> <p>Now run:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>nasm<span class="w"> </span>-f<span class="w"> </span>bin<span class="w"> </span>snake.asm<span class="w"> </span>-o<span class="w"> </span>snake.bin $<span class="w"> </span>qemu-system-x86_64<span class="w"> </span>-fda<span class="w"> </span>snake.bin </pre></div> <p><img src="bootloader-basics-snake.gif" alt="Recording of snake bootloader"></p> <p>What a phenomenal hack.</p> <p>I'm not going to get anywhere near that level of sophistication in this post but I think it's great motivation.</p> <h3 id="hello-world">Hello world</h3><p>Bootloaders are a mix of assembly programming and BIOS APIs for I/O. Since you're thinking about bootloaders you already know assembly basics. Now all you have to do is learn the APIs.</p> <p>The hello world bootloader has been explained in detail (see <a href="https://github.com/briansteffens/briansteffens.github.io/blob/master/blog/hello-world-from-a-bootloader/post.md">here</a>, <a href="https://www.ired.team/miscellaneous-reversing-forensics/windows-kernel-internals/writing-a-custom-bootloader">here</a>, and <a href="http://3zanders.co.uk/2017/10/13/writing-a-bootloader/">here</a>) so I won't go into too much line-by-line depth.</p> <p>In fact, let's just pull the code from the latter blog post.</p> <div class="highlight"><pre><span></span><span class="nf">$</span><span class="w"> </span><span class="nv">cat</span><span class="w"> </span><span class="nv">hello.asm</span> <span class="k">bits</span><span class="w"> </span><span class="mi">16</span><span class="w"> </span><span class="c1">; tell NASM this is 16 bit code</span> <span class="k">org</span><span class="w"> </span><span class="mh">0x7c00</span><span class="w"> </span><span class="c1">; tell NASM to start outputting stuff at offset 0x7c00</span> <span class="nl">boot:</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">si</span><span class="p">,</span><span class="nv">hello</span><span class="w"> </span><span class="c1">; point si register to hello label memory location</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="mh">0x0e</span><span class="w"> </span><span class="c1">; 0x0e means &#39;Write Character in TTY mode&#39;</span> <span class="nl">.loop:</span> <span class="w"> </span><span class="nf">lodsb</span> <span class="w"> </span><span class="nf">or</span><span class="w"> </span><span class="nb">al</span><span class="p">,</span><span class="nb">al</span><span class="w"> </span><span class="c1">; is al == 0 ?</span> <span class="w"> </span><span class="nf">jz</span><span class="w"> </span><span class="nv">halt</span><span class="w"> </span><span class="c1">; if (al == 0) jump to halt label</span> <span class="w"> </span><span class="nf">int</span><span class="w"> </span><span class="mh">0x10</span><span class="w"> </span><span class="c1">; runs BIOS interrupt 0x10 - Video Services</span> <span class="w"> </span><span class="nf">jmp</span><span class="w"> </span><span class="nv">.loop</span> <span class="nl">halt:</span> <span class="w"> </span><span class="nf">cli</span><span class="w"> </span><span class="c1">; clear interrupt flag</span> <span class="w"> </span><span class="nf">hlt</span><span class="w"> </span><span class="c1">; halt execution</span> <span class="nl">hello:</span><span class="w"> </span><span class="kd">db</span><span class="w"> </span><span class="s">&quot;Hello world!&quot;</span><span class="p">,</span><span class="mi">0</span> <span class="kd">times</span><span class="w"> </span><span class="mi">510</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="p">(</span><span class="kc">$</span><span class="o">-</span><span class="kc">$$</span><span class="p">)</span><span class="w"> </span><span class="nv">db</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="c1">; pad remaining 510 bytes with zeroes</span> <span class="kd">dw</span><span class="w"> </span><span class="mh">0xaa55</span><span class="w"> </span><span class="c1">; magic bootloader magic - marks this 512 byte sector bootable!</span> </pre></div> <p>The computer boots, prints "Hello world!" and hangs.</p> <p>But aside from clerical settings (16-bit assembly, where the program exists in memory, padding to 512 bytes) the only real bootloader-y magic in there is <code>int 0x10</code>, a BIOS interrupt.</p> <h4 id="bios-interrupts-=-api-calls-for-i/o">BIOS interrupts = API calls for I/O</h4><p>BIOS interrupts are API calls. Just like syscalls in userland programs they have a specific register convention and number to call for the family of APIs.</p> <p>When you write bootloader programs you'll spend most of your time at first trying to understand the behavior of the various BIOS APIs.</p> <p>The two families we'll deal with in this post are the keyboard family (documentation <a href="https://stanislavs.org/helppc/int_16.html">here</a>) and the display family (documentation <a href="https://stanislavs.org/helppc/int_10.html">here</a>).</p> <h4 id="run-hello-world">Run hello world</h4><p>Anyway, back to the hello world. Assemble it with nasm and run it with qemu.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>nasm<span class="w"> </span>-f<span class="w"> </span>bin<span class="w"> </span>hello.asm<span class="w"> </span>-o<span class="w"> </span>hello.bin $<span class="w"> </span>qemu-system-x86_64<span class="w"> </span>-fda<span class="w"> </span>hello.bin </pre></div> <p><img src="bootloader-basics-hello.gif" alt="Printing hello world"></p> <p>Getting the hang of it?</p> <h3 id="io-loop">IO Loop</h3><p>The specific function we called above to write a character to the display is <a href="https://stanislavs.org/helppc/int_10-e.html">INT 10,E</a>. The <code>0x10</code> is the argument that you call the <code>int</code> keyword with (e.g. <code>int 0x10</code>). And the <code>E</code> is the specific function within the <code>0x10</code> family. The <code>E</code> is written into the <code>AH</code> register before calling <code>int</code>. The ASCII code to be written is placed in the <code>AL</code> register.</p> <p>Now that output makes some sense, let's do input. In the <a href="https://stanislavs.org/helppc/int_16.html">keyboard services documentation</a> you may notice that <a href="https://stanislavs.org/helppc/int_16-0.html">INT 16,0</a> provides a way to block for user input. According to that page the ASCII character will be in <code>AL</code> when the interrupt returns.</p> <h4 id="clearing-the-screen">Clearing the screen</h4><p>You may have noticed some text gets displayed before our program runs. We can use <a href="https://stanislavs.org/helppc/int_10-0.html">INT 0x10,0</a> to clear the screen.</p> <div class="highlight"><pre><span></span> ;; Clear screen mov ah, 0x00 mov al, 0x03 int 0x10 </pre></div> <h4 id="all-together">All together</h4><p>Since the display function reads from the same register the input function outputs to, we can just call both interrupts after each other. Wrap this in a loop and we have the world's worst editor.</p> <div class="highlight"><pre><span></span><span class="o">$</span><span class="w"> </span><span class="n">cat</span><span class="w"> </span><span class="n">ioloop</span><span class="o">.</span><span class="n">asm</span> <span class="n">bits</span><span class="w"> </span><span class="mi">16</span> <span class="n">org</span><span class="w"> </span><span class="mh">0x7c00</span> <span class="n">main</span><span class="p">:</span> <span class="w"> </span><span class="p">;;</span><span class="w"> </span><span class="n">Clear</span><span class="w"> </span><span class="n">screen</span> <span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">ah</span><span class="p">,</span><span class="w"> </span><span class="mh">0x00</span> <span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">al</span><span class="p">,</span><span class="w"> </span><span class="mh">0x03</span> <span class="w"> </span><span class="nb nb-Type">int</span><span class="w"> </span><span class="mh">0x10</span> <span class="o">.</span><span class="n">loop</span><span class="p">:</span> <span class="w"> </span><span class="p">;;</span><span class="w"> </span><span class="n">Read</span><span class="w"> </span><span class="n">character</span> <span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">ah</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span> <span class="w"> </span><span class="nb nb-Type">int</span><span class="w"> </span><span class="mh">0x16</span> <span class="w"> </span><span class="p">;;</span><span class="w"> </span><span class="n">Print</span><span class="w"> </span><span class="n">character</span> <span class="w"> </span><span class="n">mov</span><span class="w"> </span><span class="n">ah</span><span class="p">,</span><span class="w"> </span><span class="mh">0x0e</span> <span class="w"> </span><span class="nb nb-Type">int</span><span class="w"> </span><span class="mh">0x10</span> <span class="w"> </span><span class="n">jmp</span><span class="w"> </span><span class="o">.</span><span class="n">loop</span> <span class="n">times</span><span class="w"> </span><span class="mi">510</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="p">(</span><span class="o">$-$$</span><span class="p">)</span><span class="w"> </span><span class="n">db</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">;</span><span class="w"> </span><span class="n">pad</span><span class="w"> </span><span class="n">remaining</span><span class="w"> </span><span class="mi">510</span><span class="w"> </span><span class="n">bytes</span><span class="w"> </span><span class="n">with</span><span class="w"> </span><span class="n">zeroes</span> <span class="n">dw</span><span class="w"> </span><span class="mh">0xaa55</span><span class="w"> </span><span class="p">;</span><span class="w"> </span><span class="n">magic</span><span class="w"> </span><span class="n">bootloader</span><span class="w"> </span><span class="n">magic</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">marks</span><span class="w"> </span><span class="n">this</span><span class="w"> </span><span class="mi">512</span><span class="w"> </span><span class="n">byte</span><span class="w"> </span><span class="n">sector</span><span class="w"> </span><span class="n">bootable</span><span class="o">!</span> </pre></div> <p class="note"> By the way, the <code>main</code> label here (like the <code>boot</code> label above in <code>hello.asm</code>) is only to help the reader. It is not something the BIOS uses. </p><p>Now that we've got the code, let's run it!</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>nasm<span class="w"> </span>-f<span class="w"> </span>bin<span class="w"> </span>ioloop.asm<span class="w"> </span>-o<span class="w"> </span>ioloop.bin $<span class="w"> </span>qemu-system-x86_64<span class="w"> </span>-fda<span class="w"> </span>ioloop.bin </pre></div> <p><img src="bootloader-basics-ioloop.gif" alt="Recording of ioloop bootloader"></p> <h3 id="digression-on-abstraction">Digression on abstraction</h3><p>There are two ways to build abstractions: assembly functions and nasm macros.</p> <p>We could build a clear screen function like this:</p> <div class="highlight"><pre><span></span><span class="nl">clear_screen:</span> <span class="w"> </span><span class="c1">;; Clear screen</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="w"> </span><span class="mh">0x00</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">al</span><span class="p">,</span><span class="w"> </span><span class="mh">0x03</span> <span class="w"> </span><span class="nf">int</span><span class="w"> </span><span class="mh">0x10</span> <span class="w"> </span><span class="nf">ret</span> </pre></div> <p>And then we can call this in the ioloop program like so:</p> <div class="highlight"><pre><span></span><span class="k">bits</span><span class="w"> </span><span class="mi">16</span> <span class="k">org</span><span class="w"> </span><span class="mh">0x7c00</span> <span class="nf">jmp</span><span class="w"> </span><span class="nv">main</span> <span class="nl">clear_screen:</span> <span class="w"> </span><span class="c1">;; Clear screen</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="w"> </span><span class="mh">0x00</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">al</span><span class="p">,</span><span class="w"> </span><span class="mh">0x03</span> <span class="w"> </span><span class="nf">int</span><span class="w"> </span><span class="mh">0x10</span> <span class="w"> </span><span class="nf">ret</span> <span class="nl">main:</span> <span class="w"> </span><span class="nf">call</span><span class="w"> </span><span class="nv">clear_screen</span> <span class="nl">.loop:</span> <span class="w"> </span><span class="c1">;; Read character</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span> <span class="w"> </span><span class="nf">int</span><span class="w"> </span><span class="mh">0x16</span> <span class="w"> </span><span class="c1">;; Print character</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="w"> </span><span class="mh">0x0e</span> <span class="w"> </span><span class="nf">int</span><span class="w"> </span><span class="mh">0x10</span> <span class="w"> </span><span class="nf">jmp</span><span class="w"> </span><span class="nv">.loop</span> <span class="kd">times</span><span class="w"> </span><span class="mi">510</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="p">(</span><span class="kc">$</span><span class="o">-</span><span class="kc">$$</span><span class="p">)</span><span class="w"> </span><span class="nv">db</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="c1">; pad remaining 510 bytes with zeroes</span> <span class="kd">dw</span><span class="w"> </span><span class="mh">0xaa55</span><span class="w"> </span><span class="c1">; magic bootloader magic - marks this 512 byte sector bootable!</span> </pre></div> <p>On the other hand if you do it in a macro:</p> <div class="highlight"><pre><span></span><span class="k">bits</span><span class="w"> </span><span class="mi">16</span> <span class="k">org</span><span class="w"> </span><span class="mh">0x7c00</span> <span class="nf">jmp</span><span class="w"> </span><span class="nv">main</span> <span class="cp">%macro cls 0 </span><span class="c1">; Zero is the number of arguments</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="w"> </span><span class="mh">0x00</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">al</span><span class="p">,</span><span class="w"> </span><span class="mh">0x03</span> <span class="w"> </span><span class="nf">int</span><span class="w"> </span><span class="mh">0x10</span> <span class="cp">%endmacro</span> <span class="nl">main:</span> <span class="w"> </span><span class="nf">cls</span> <span class="nl">.loop:</span> <span class="w"> </span><span class="c1">;; Read character</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span> <span class="w"> </span><span class="nf">int</span><span class="w"> </span><span class="mh">0x16</span> <span class="w"> </span><span class="c1">;; Print character</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="w"> </span><span class="mh">0x0e</span> <span class="w"> </span><span class="nf">int</span><span class="w"> </span><span class="mh">0x10</span> <span class="w"> </span><span class="nf">jmp</span><span class="w"> </span><span class="nv">.loop</span> <span class="kd">times</span><span class="w"> </span><span class="mi">510</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="p">(</span><span class="kc">$</span><span class="o">-</span><span class="kc">$$</span><span class="p">)</span><span class="w"> </span><span class="nv">db</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="c1">; pad remaining 510 bytes with zeroes</span> <span class="kd">dw</span><span class="w"> </span><span class="mh">0xaa55</span><span class="w"> </span><span class="c1">; magic bootloader magic - marks this 512 byte sector bootable!</span> </pre></div> <p>And nasm macros even have a way to write macro-safe labels by prefixing them with <code>%%</code> which is useful if you have conditions or loops within a macro.</p> <p>The benefit of a macro I guess is that you're not using up the stack. The benefit of a function call is that you're not duplicating code every place you use a macro. The amount of generated code eventually becomes important in bootloaders because the code must fit into 512 bytes.</p> <p>I lean more toward using macros in this code.</p> <h3 id="complex-input">Complex input</h3><p>Reading ASCII characters is not complicated as we saw above. But what if we want to build Readline style shortcuts like ctrl-a for jumping to the start of the line?</p> <p>Using INT 16,0 as we do above is fine. But rather than solely reading from the result of that function call, there is a section of memory that contains both the character pressed and control characters pressed.</p> <p>Based on documentation for this memory area (found <a href="http://www.techhelpmanual.com/93-rom_bios_variables.html">here</a> or <a href="https://www.tau.ac.il/~flaxer/edu/course/processcontrol/BiosDataArea.pdf">here</a>), we can build a macro for reading the pressed character:</p> <div class="highlight"><pre><span></span><span class="cp">%macro mov_read_character_into 1</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">eax</span><span class="p">,</span><span class="w"> </span><span class="p">[</span><span class="mh">0x041a</span><span class="p">]</span> <span class="w"> </span><span class="nf">add</span><span class="w"> </span><span class="nb">eax</span><span class="p">,</span><span class="w"> </span><span class="mh">0x03fe</span><span class="w"> </span><span class="c1">; Offset from 0x0400 - sizeof(uint16) (since head points to next free slot, not last/current slot)</span> <span class="w"> </span><span class="nf">and</span><span class="w"> </span><span class="nb">eax</span><span class="p">,</span><span class="w"> </span><span class="mh">0xFFFF</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="o">%</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="p">[</span><span class="nb">eax</span><span class="p">]</span> <span class="w"> </span><span class="nf">and</span><span class="w"> </span><span class="o">%</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mh">0xFF</span> <span class="cp">%endmacro</span> </pre></div> <p>And another macro for reading the pressed control character (if any):</p> <div class="highlight"><pre><span></span><span class="cp">%macro mov_read_ctrl_flag_into 1</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="o">%</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="p">[</span><span class="mh">0x0417</span><span class="p">]</span> <span class="w"> </span><span class="nf">and</span><span class="w"> </span><span class="o">%</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mh">0x04</span><span class="w"> </span><span class="c1">; Grab 3rd bit: %1 &amp; 0b0100</span> <span class="cp">%endmacro</span> </pre></div> <h3 id="cursor-location">Cursor location</h3><p>Lastly we'll use some cursor APIs that allow us to handle newlines, backspace on the first column of a line, and ctrl-a (jump to beginning of line).</p> <div class="highlight"><pre><span></span><span class="cp">%macro get_position 0</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="w"> </span><span class="mh">0x03</span> <span class="w"> </span><span class="nf">int</span><span class="w"> </span><span class="mh">0x10</span> <span class="cp">%endmacro</span> <span class="cp">%macro set_position 0</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="w"> </span><span class="mh">0x02</span> <span class="w"> </span><span class="nf">int</span><span class="w"> </span><span class="mh">0x10</span> <span class="cp">%endmacro</span> </pre></div> <p>But there's something buggy about my <code>goto_end_of_line</code> function. Sometimes it works and sometimes it just jumps all over the screen in an infinite loop. Part of the problem is that the editor memory is the video card. The cursor location is only stored there and not in some program state like you might do in a high-level environment/language.</p> <div class="highlight"><pre><span></span><span class="nl">goto_end_of_line:</span> <span class="c1">;; Get current character</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="w"> </span><span class="mh">0x08</span> <span class="w"> </span><span class="nf">int</span><span class="w"> </span><span class="mh">0x10</span> <span class="c1">;; Iterate until the character is null</span> <span class="w"> </span><span class="nf">cmp</span><span class="w"> </span><span class="nb">al</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span> <span class="w"> </span><span class="nf">jz</span><span class="w"> </span><span class="nv">.done</span> <span class="w"> </span><span class="nf">inc</span><span class="w"> </span><span class="nb">dl</span> <span class="w"> </span><span class="nf">set_position</span> <span class="w"> </span><span class="nf">jmp</span><span class="w"> </span><span class="nv">goto_end_of_line</span> <span class="nl">.done:</span> <span class="w"> </span><span class="nf">ret</span> </pre></div> <p>Alright, let's put all these pieces together.</p> <h3 id="editor-with-keyboard-shortcuts">Editor with keyboard shortcuts</h3><p>Start with the basics in <code>editor.asm</code>.</p> <div class="highlight"><pre><span></span><span class="c1">; -*- mode: nasm;-*-</span> <span class="k">bits</span><span class="w"> </span><span class="mi">16</span> <span class="k">org</span><span class="w"> </span><span class="mh">0x7c00</span> <span class="w"> </span><span class="nf">jmp</span><span class="w"> </span><span class="nv">main</span> </pre></div> <p>Then add a clear screen macro.</p> <div class="highlight"><pre><span></span><span class="cp">%macro cls 0</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="w"> </span><span class="mh">0x00</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">al</span><span class="p">,</span><span class="w"> </span><span class="mh">0x03</span> <span class="w"> </span><span class="nf">int</span><span class="w"> </span><span class="mh">0x10</span> <span class="cp">%endmacro</span> </pre></div> <p>Add macros for reading and printing.</p> <div class="highlight"><pre><span></span><span class="cp">%macro read_character 0</span> <span class="w"> </span><span class="c1">;; Read character</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span> <span class="w"> </span><span class="nf">int</span><span class="w"> </span><span class="mh">0x16</span> <span class="cp">%endmacro</span> <span class="cp">%macro print_character 1</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">ax</span><span class="p">,</span><span class="w"> </span><span class="o">%</span><span class="mi">1</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="w"> </span><span class="mh">0x0e</span> <span class="w"> </span><span class="nf">int</span><span class="w"> </span><span class="mh">0x10</span> <span class="cp">%endmacro</span> </pre></div> <p>Add cursor utilities.</p> <div class="highlight"><pre><span></span><span class="cp">%macro get_position 0</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="w"> </span><span class="mh">0x03</span> <span class="w"> </span><span class="nf">int</span><span class="w"> </span><span class="mh">0x10</span> <span class="cp">%endmacro</span> <span class="cp">%macro set_position 0</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="w"> </span><span class="mh">0x02</span> <span class="w"> </span><span class="nf">int</span><span class="w"> </span><span class="mh">0x10</span> <span class="cp">%endmacro</span> <span class="nl">goto_end_of_line:</span> <span class="c1">;; Get current character</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="w"> </span><span class="mh">0x08</span> <span class="w"> </span><span class="nf">int</span><span class="w"> </span><span class="mh">0x10</span> <span class="c1">;; Iterate until the character is null</span> <span class="w"> </span><span class="nf">cmp</span><span class="w"> </span><span class="nb">al</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span> <span class="w"> </span><span class="nf">jz</span><span class="w"> </span><span class="nv">.done</span> <span class="w"> </span><span class="nf">inc</span><span class="w"> </span><span class="nb">dl</span> <span class="w"> </span><span class="nf">set_position</span> <span class="w"> </span><span class="nf">jmp</span><span class="w"> </span><span class="nv">goto_end_of_line</span> <span class="nl">.done:</span> <span class="w"> </span><span class="nf">ret</span> </pre></div> <p>And keyboard utilities.</p> <div class="highlight"><pre><span></span><span class="cp">%macro mov_read_ctrl_flag_into 1</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="o">%</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="p">[</span><span class="mh">0x0417</span><span class="p">]</span> <span class="w"> </span><span class="nf">and</span><span class="w"> </span><span class="o">%</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mh">0x04</span><span class="w"> </span><span class="c1">; Grab 3rd bit: %1 &amp; 0b0100</span> <span class="cp">%endmacro</span> <span class="cp">%macro mov_read_character_into 1</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">eax</span><span class="p">,</span><span class="w"> </span><span class="p">[</span><span class="mh">0x041a</span><span class="p">]</span> <span class="w"> </span><span class="nf">add</span><span class="w"> </span><span class="nb">eax</span><span class="p">,</span><span class="w"> </span><span class="mh">0x03fe</span><span class="w"> </span><span class="c1">; Offset from 0x0400 - sizeof(uint16) (since head points to next free slot, not last/current slot)</span> <span class="w"> </span><span class="nf">and</span><span class="w"> </span><span class="nb">eax</span><span class="p">,</span><span class="w"> </span><span class="mh">0xFFFF</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="o">%</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="p">[</span><span class="nb">eax</span><span class="p">]</span> <span class="w"> </span><span class="nf">and</span><span class="w"> </span><span class="o">%</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mh">0xFF</span> <span class="cp">%endmacro</span> </pre></div> <p>Now we can start the editor loop where we wait for a keypress and handle it.</p> <div class="highlight"><pre><span></span><span class="nl">editor_action:</span> <span class="w"> </span><span class="nf">read_character</span> </pre></div> <p>Don't print ASCII garbage if the key pressed is an arrow key. Just do nothing. (This isn't good editor behavior in general but ours is a limited one.)</p> <div class="highlight"><pre><span></span><span class="c1">;; Ignore arrow keys</span> <span class="w"> </span><span class="nf">cmp</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="w"> </span><span class="mh">0x4b</span><span class="w"> </span><span class="c1">; Left</span> <span class="w"> </span><span class="nf">jz</span><span class="w"> </span><span class="nv">.done</span> <span class="w"> </span><span class="nf">cmp</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="w"> </span><span class="mh">0x50</span><span class="w"> </span><span class="c1">; Down</span> <span class="w"> </span><span class="nf">jz</span><span class="w"> </span><span class="nv">.done</span> <span class="w"> </span><span class="nf">cmp</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="w"> </span><span class="mh">0x4d</span><span class="w"> </span><span class="c1">; Right</span> <span class="w"> </span><span class="nf">jz</span><span class="w"> </span><span class="nv">.done</span> <span class="w"> </span><span class="nf">cmp</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="w"> </span><span class="mh">0x48</span><span class="w"> </span><span class="c1">; Up</span> <span class="w"> </span><span class="nf">jz</span><span class="w"> </span><span class="nv">.done</span> </pre></div> <p>Next handle backspace.</p> <div class="highlight"><pre><span></span><span class="c1">;; Handle backspace</span> <span class="w"> </span><span class="nf">cmp</span><span class="w"> </span><span class="nb">al</span><span class="p">,</span><span class="w"> </span><span class="mh">0x08</span> <span class="w"> </span><span class="nf">jz</span><span class="w"> </span><span class="nv">.is_backspace</span> <span class="w"> </span><span class="nf">cmp</span><span class="w"> </span><span class="nb">al</span><span class="p">,</span><span class="w"> </span><span class="mh">0x7F</span><span class="w"> </span><span class="c1">; For mac keyboards</span> <span class="w"> </span><span class="nf">jnz</span><span class="w"> </span><span class="nv">.done_backspace</span> <span class="nl">.is_backspace:</span> <span class="w"> </span><span class="nf">get_position</span> </pre></div> <p>If this key is pressed at the first line and the first column, do nothing.</p> <div class="highlight"><pre><span></span><span class="c1">;; Handle 0,0 coordinate (do nothing)</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">al</span><span class="p">,</span><span class="w"> </span><span class="nb">dh</span> <span class="w"> </span><span class="nf">add</span><span class="w"> </span><span class="nb">al</span><span class="p">,</span><span class="w"> </span><span class="nb">dl</span> <span class="w"> </span><span class="nf">jz</span><span class="w"> </span><span class="nv">.overwrite_character</span> </pre></div> <p>Otherwise if backspace is pressed not at the beginning of the line, just overwrite the last character with the ASCII 0 (the code 0 not the digit 0).</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nf">cmp</span><span class="w"> </span><span class="nb">dl</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span> <span class="w"> </span><span class="nf">jz</span><span class="w"> </span><span class="nv">.backspace_at_start_of_line</span> <span class="w"> </span><span class="nf">dec</span><span class="w"> </span><span class="nb">dl</span><span class="w"> </span><span class="c1">; Decrement column</span> <span class="w"> </span><span class="nf">set_position</span> <span class="w"> </span><span class="nf">jmp</span><span class="w"> </span><span class="nv">.overwrite_character</span> </pre></div> <p>Otherwise you're at the beginning of the line and you need to jump to the end of the previous line.</p> <div class="highlight"><pre><span></span><span class="nl">.backspace_at_start_of_line:</span> <span class="w"> </span><span class="nf">dec</span><span class="w"> </span><span class="nb">dh</span><span class="w"> </span><span class="c1">; Decrement row</span> <span class="w"> </span><span class="nf">set_position</span> <span class="w"> </span><span class="nf">call</span><span class="w"> </span><span class="nv">goto_end_of_line</span> <span class="nl">.overwrite_character:</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">al</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">ah</span><span class="p">,</span><span class="w"> </span><span class="mh">0x0a</span> <span class="w"> </span><span class="nf">int</span><span class="w"> </span><span class="mh">0x10</span> <span class="w"> </span><span class="nf">jmp</span><span class="w"> </span><span class="nv">.done</span> <span class="nl">.done_backspace:</span> </pre></div> <p>Next we handle the Enter key. This should move the cursor onto the next line and set the column back to zero.</p> <div class="highlight"><pre><span></span><span class="c1">;; Handle enter</span> <span class="w"> </span><span class="nf">mov_read_character_into</span><span class="w"> </span><span class="nb">ax</span> <span class="w"> </span><span class="nf">cmp</span><span class="w"> </span><span class="nb">al</span><span class="p">,</span><span class="w"> </span><span class="mh">0x0d</span> <span class="w"> </span><span class="nf">jnz</span><span class="w"> </span><span class="nv">.done_enter</span> <span class="w"> </span><span class="nf">get_position</span> <span class="w"> </span><span class="nf">inc</span><span class="w"> </span><span class="nb">dh</span><span class="w"> </span><span class="c1">; Increment line</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">dl</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="c1">; Reset column</span> <span class="w"> </span><span class="nf">set_position</span> <span class="w"> </span><span class="nf">jmp</span><span class="w"> </span><span class="nv">.done</span> <span class="nl">.done_enter:</span> </pre></div> <p>Next we handle ctrl-a, jump to start of line.</p> <div class="highlight"><pre><span></span><span class="c1">;; Handle ctrl- shortcuts</span> <span class="c1">;; Check ctrl key</span> <span class="w"> </span><span class="nf">mov_read_ctrl_flag_into</span><span class="w"> </span><span class="nb">ax</span> <span class="w"> </span><span class="nf">jz</span><span class="w"> </span><span class="nv">.ctrl_not_set</span> <span class="c1">;; Handle ctrl-a shortcut</span> <span class="w"> </span><span class="nf">mov_read_character_into</span><span class="w"> </span><span class="nb">ax</span> <span class="w"> </span><span class="nf">cmp</span><span class="w"> </span><span class="nb">al</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="c1">; For some reason with ctlr, these are offset from a-z</span> <span class="w"> </span><span class="nf">jnz</span><span class="w"> </span><span class="nv">.not_ctrl_a</span> <span class="c1">;; Reset column</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="nb">dl</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span> <span class="w"> </span><span class="nf">set_position</span> <span class="w"> </span><span class="nf">jmp</span><span class="w"> </span><span class="nv">.done</span> <span class="nl">.not_ctrl_a:</span> </pre></div> <p>For ctrl-e, jump to the end of the line.</p> <div class="highlight"><pre><span></span><span class="c1">;; Handle ctrl-e shortcut</span> <span class="w"> </span><span class="nf">mov_read_character_into</span><span class="w"> </span><span class="nb">ax</span> <span class="w"> </span><span class="nf">cmp</span><span class="w"> </span><span class="nb">al</span><span class="p">,</span><span class="w"> </span><span class="mi">5</span> <span class="w"> </span><span class="nf">jnz</span><span class="w"> </span><span class="nv">.not_ctrl_e</span> <span class="w"> </span><span class="nf">call</span><span class="w"> </span><span class="nv">goto_end_of_line</span> <span class="w"> </span><span class="nf">jmp</span><span class="w"> </span><span class="nv">.done</span> <span class="nl">.not_ctrl_e:</span> <span class="w"> </span><span class="nf">jmp</span><span class="w"> </span><span class="nv">.done</span> <span class="nl">.ctrl_not_set:</span> </pre></div> <p>Finally if none of these cases are met, just print the pressed character and return.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nf">mov_read_character_into</span><span class="w"> </span><span class="nb">ax</span> <span class="w"> </span><span class="nf">print_character</span><span class="w"> </span><span class="nb">ax</span> <span class="nl">.done:</span> <span class="w"> </span><span class="nf">ret</span> </pre></div> <p>Finally, create the main function that calls this editor code in a loop.</p> <div class="highlight"><pre><span></span><span class="nl">main:</span> <span class="w"> </span><span class="nf">cls</span> <span class="nl">.loop:</span> <span class="w"> </span><span class="nf">call</span><span class="w"> </span><span class="nv">editor_action</span> <span class="w"> </span><span class="nf">jmp</span><span class="w"> </span><span class="nv">.loop</span> <span class="kd">times</span><span class="w"> </span><span class="mi">510</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="p">(</span><span class="kc">$</span><span class="o">-</span><span class="kc">$$</span><span class="p">)</span><span class="w"> </span><span class="nv">db</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="c1">; pad remaining 510 bytes with zeroes</span> <span class="kd">dw</span><span class="w"> </span><span class="mh">0xaa55</span><span class="w"> </span><span class="c1">; magic bootloader magic - marks this 512 byte sector bootable!</span> </pre></div> <p>And we're done! Try it out:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>nasm<span class="w"> </span>-f<span class="w"> </span>bin<span class="w"> </span>editor.asm<span class="w"> </span>-o<span class="w"> </span>editor.bin $<span class="w"> </span>qemu-system-x86_64<span class="w"> </span>-fda<span class="w"> </span>editor.bin </pre></div> <p><img src="bootloader-basics-editor.gif" alt="Recording of a bad editor"></p> <p>Tedious and buggy! But I learned something, I think.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I wrote a new post on my first time exploring bootloader basics! Neat to discover the BIOS APIs and spend some time actually coding in assembly versus just generating or emulating it.<a href="https://t.co/7iP6Nib620">https://t.co/7iP6Nib620</a> <a href="https://t.co/xSyG1IXgEB">pic.twitter.com/xSyG1IXgEB</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1485398216124346371?ref_src=twsrc%5Etfw">January 23, 2022</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/bootloader-basics.htmlSun, 23 Jan 2022 00:00:00 +0000dsq: Commandline tool for running SQL queries against JSON, CSV, Excel, Parquet, and more.http://notes.eatonphil.com/dsq.html<head> <meta http-equiv="refresh" content="4;URL='https://datastation.multiprocess.io/blog/2022-01-11-dsq.html" /> </head><p>This is an external post of mine. Click <a href="https://datastation.multiprocess.io/blog/2022-01-11-dsq.html">here</a> if you are not redirected.</p> http://notes.eatonphil.com/dsq.htmlTue, 11 Jan 2022 00:00:00 +0000Analyzing large JSON files via partial JSON parsinghttp://notes.eatonphil.com/analyzing-large-json-files-via-partial-json-parsing.html<head> <meta http-equiv="refresh" content="4;URL='https://datastation.multiprocess.io/blog/2022-01-06-analyzing-large-json-files-via-partial-json-parsing.html'" /> </head><p>This is an external post of mine. Click <a href="https://datastation.multiprocess.io/blog/2022-01-06-analyzing-large-json-files-via-partial-json-parsing.html">here</a> if you are not redirected.</p> http://notes.eatonphil.com/analyzing-large-json-files-via-partial-json-parsing.htmlThu, 06 Jan 2022 00:00:00 +0000The year in books: 11 to recommend in 2021http://notes.eatonphil.com/year-in-books-2021.html<p>Last year (2021) I finished 17 books, a five year low. But that's ok! 4 fiction and 13 non-fiction. Another 30 started but not finished.</p> <h3 id="non-fiction">Non-fiction</h3><p>It seems I was pretty focused on business history books and history of tech. The 8 non-fiction books I liked the most:</p> <ul> <li><a href="https://www.goodreads.com/book/show/34626431-designing-data-intensive-applications">Designing Data-Intensive Applications</a>, a must-read for anyone interacting with a database</li> <li><a href="https://www.goodreads.com/book/show/24715220-my-years-with-general-motors">My Years with General Motors</a>, the business school classic; truly a good read. But sad to know that shortly after written, GM succumbs to the Japanese and South Korean competition</li> <li><a href="https://www.goodreads.com/book/show/49195924-no-rules-rules">No Rules Rules: Netflix and the Culture of Reinvention</a></li> <li><a href="https://www.goodreads.com/book/show/55297149-working-backwards">Working Backwards: Insights, Stories, and Secrets from Inside Amazon</a></li> <li><a href="https://www.goodreads.com/book/show/54216469-working-in-public">Working in Public: The Making and Maintenance of Open Source Software</a>, my review <a href="https://www.goodreads.com/review/show/3478346828?book_show_action=false&amp;from_review_page=1">here</a></li> <li><a href="https://www.goodreads.com/book/show/22401445-intel-trinity-the">The Intel Trinity</a>, an early history of Intel</li> <li><a href="https://www.goodreads.com/book/show/19383579-the-hp-way">The HP Way</a></li> <li><a href="https://www.goodreads.com/book/show/58208477-play-nice-but-win">Play Nice But Win</a>, the story of Dell computers</li> <li><a href="https://www.goodreads.com/book/show/36316219-west-with-the-night">West with the Night</a>, beautiful memoir recommended by Ernest Hemingway and written in a similar style. Much more enjoyable than the other more popular colonial-African memoir, Out of Africa.</li> </ul> <h4 id="the-rest">The rest</h4><ul> <li><a href="https://www.goodreads.com/book/show/16059922-pour-your-heart-into-it">Pour Your Heart Into It: How Starbucks Built a Company One Cup at a Time</a></li> <li><a href="https://www.goodreads.com/book/show/43063719-jump-starting-america">Jump-Starting America: How Breakthrough Science Can Revive Economic Growth and the American Dream</a></li> <li><a href="https://www.goodreads.com/book/show/9118033-rework">ReWork</a></li> <li><a href="https://www.goodreads.com/book/show/297901.Russia_and_the_Russians">Russia and the Russians: A History</a></li> </ul> <h3 id="fiction">Fiction</h3><p>The 3 fiction books I liked the most:</p> <ul> <li><a href="https://www.goodreads.com/book/show/12970829-a-very-british-coup">A Very British Coup</a>, hilarious and depressing. A great companion to the TV show "Yes, Minister"</li> <li><a href="https://www.goodreads.com/book/show/8862633-mort">Mort</a>, Terry Pratchett is a very funny author</li> <li><a href="https://www.goodreads.com/book/show/18625885-selected-stories-of-philip-k-dick">Selected Stories of Philip K Dick</a>, depressing and dystopian but very well written. I would not read again because it's too depressing</li> </ul> <h4 id="the-rest">The rest</h4><ul> <li><a href="https://www.goodreads.com/book/show/51135871-there-and-never-ever-back-again">There and NEVER, EVER BACK AGAIN: A Dark Lord's Diary</a>, I was looking for more parodies like Bored of the Rings (which itself wasn't great). This was worse</li> </ul> <h3 id="2022">2022</h3><p>This year I'm interested in continuing to find good business books and good books on the history of tech. I'm also getting into more American history to make up for all the years of not paying attention in high school.</p> <p>I'm continuing to try to find good memoirs and fiction by non-English authors.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Starting the blog-year off gently with my recap of 2021 in books.<br><br>I spent too much time watching TV and trying new video games to keep up with past years 😅<a href="https://t.co/5mfXbBnihk">https://t.co/5mfXbBnihk</a> <a href="https://t.co/ZHmPsUcr3g">pic.twitter.com/ZHmPsUcr3g</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1478764597033283591?ref_src=twsrc%5Etfw">January 5, 2022</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/year-in-books-2021.htmlWed, 05 Jan 2022 00:00:00 +0000Writing a minimal Lua implementation with a virtual machine from scratch in Rusthttp://notes.eatonphil.com/lua-in-rust.html<p>By the end of this guide we'll have a minimal, working implementation of a small part of Lua from scratch. It will be able to run the following program (among others):</p> <div class="highlight"><pre><span></span><span class="kr">function</span> <span class="nf">fib</span><span class="p">(</span><span class="n">n</span><span class="p">)</span> <span class="kr">if</span> <span class="n">n</span> <span class="o">&lt;</span> <span class="mi">2</span> <span class="kr">then</span> <span class="kr">return</span> <span class="n">n</span><span class="p">;</span> <span class="kr">end</span> <span class="kd">local</span> <span class="n">n1</span> <span class="o">=</span> <span class="n">fib</span><span class="p">(</span><span class="n">n</span><span class="o">-</span><span class="mi">1</span><span class="p">);</span> <span class="kd">local</span> <span class="n">n2</span> <span class="o">=</span> <span class="n">fib</span><span class="p">(</span><span class="n">n</span><span class="o">-</span><span class="mi">2</span><span class="p">);</span> <span class="kr">return</span> <span class="n">n1</span> <span class="o">+</span> <span class="n">n2</span><span class="p">;</span> <span class="kr">end</span> <span class="nb">print</span><span class="p">(</span><span class="n">fib</span><span class="p">(</span><span class="mi">30</span><span class="p">));</span> </pre></div> <p>This is my second project in Rust and only the third time I've invented an instruction set so don't take my style as gospel. However, I have found some Rust parsing tutorials overly complex so I'm hoping you'll find this one simpler.</p> <p>All <a href="https://github.com/eatonphil/lust">source code is available on Github</a>.</p> <h3 id="entrypoint">Entrypoint</h3><p>Running <code>cargo init</code> will give the boilerplate necessary. In <code>src/main.rs</code> we'll accept a file name from the command line, perform lexical analysis to retrieve all tokens from the file, perform grammar analysis on the tokens to retrieve a tree structure, compile the tree to a linear set of virtual machine instructions, and finally interpret the virtual machine instructions.</p> <div class="highlight"><pre><span></span><span class="k">mod</span> <span class="nn">eval</span><span class="p">;</span> <span class="k">mod</span> <span class="nn">lex</span><span class="p">;</span> <span class="k">mod</span> <span class="nn">parse</span><span class="p">;</span> <span class="k">use</span><span class="w"> </span><span class="n">std</span>::<span class="n">env</span><span class="p">;</span> <span class="k">use</span><span class="w"> </span><span class="n">std</span>::<span class="n">fs</span><span class="p">;</span> <span class="k">fn</span> <span class="nf">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">args</span>: <span class="nb">Vec</span><span class="o">&lt;</span><span class="nb">String</span><span class="o">&gt;</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">env</span>::<span class="n">args</span><span class="p">().</span><span class="n">collect</span><span class="p">();</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">contents</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fs</span>::<span class="n">read_to_string</span><span class="p">(</span><span class="o">&amp;</span><span class="n">args</span><span class="p">[</span><span class="mi">1</span><span class="p">]).</span><span class="n">expect</span><span class="p">(</span><span class="s">&quot;Could not read file&quot;</span><span class="p">);</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">raw</span>: <span class="nb">Vec</span><span class="o">&lt;</span><span class="kt">char</span><span class="o">&gt;</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">contents</span><span class="p">.</span><span class="n">chars</span><span class="p">().</span><span class="n">collect</span><span class="p">();</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">tokens</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">match</span><span class="w"> </span><span class="n">lex</span>::<span class="n">lex</span><span class="p">(</span><span class="o">&amp;</span><span class="n">raw</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">Ok</span><span class="p">(</span><span class="n">tokens</span><span class="p">)</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="n">tokens</span><span class="p">,</span> <span class="w"> </span><span class="nb">Err</span><span class="p">(</span><span class="n">msg</span><span class="p">)</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="fm">panic!</span><span class="p">(</span><span class="s">&quot;{}&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">msg</span><span class="p">),</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">ast</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">match</span><span class="w"> </span><span class="n">parse</span>::<span class="n">parse</span><span class="p">(</span><span class="o">&amp;</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="n">tokens</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">Ok</span><span class="p">(</span><span class="n">ast</span><span class="p">)</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="n">ast</span><span class="p">,</span> <span class="w"> </span><span class="nb">Err</span><span class="p">(</span><span class="n">msg</span><span class="p">)</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="fm">panic!</span><span class="p">(</span><span class="s">&quot;{}&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">msg</span><span class="p">),</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">pgrm</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">eval</span>::<span class="n">compile</span><span class="p">(</span><span class="o">&amp;</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="n">ast</span><span class="p">);</span> <span class="w"> </span><span class="n">eval</span>::<span class="n">eval</span><span class="p">(</span><span class="n">pgrm</span><span class="p">);</span> <span class="p">}</span> </pre></div> <p>Easy peasy. Now let's implement <code>lex</code>.</p> <h3 id="lexical-analysis">Lexical analysis</h3><p>Lexical analysis drops whitespace (Lua is not whitespace sensitive) and chunks all source code characters into their smallest possible meaningful pieces like commas, numbers, identifiers, keywords, etc.</p> <p>In order to have useful error messages, we'll keep track of state in the file with a <code>Location</code> struct that implements <code>increment</code> and <code>debug</code>.</p> <p>This goes in <code>src/lex.rs</code>.</p> <div class="highlight"><pre><span></span><span class="cp">#[derive(Copy, Clone, Debug)]</span> <span class="k">pub</span><span class="w"> </span><span class="k">struct</span> <span class="nc">Location</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">col</span>: <span class="kt">i32</span><span class="p">,</span> <span class="w"> </span><span class="n">line</span>: <span class="kt">i32</span><span class="p">,</span> <span class="w"> </span><span class="n">index</span>: <span class="kt">usize</span><span class="p">,</span> <span class="p">}</span> </pre></div> <p>The <code>increment</code> function will update line and column numbers as well as the current index in the file.</p> <div class="highlight"><pre><span></span><span class="kd">impl</span><span class="w"> </span><span class="nx">Location</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">fn</span><span class="w"> </span><span class="nx">increment</span><span class="p">(</span><span class="o">&amp;</span><span class="kp">self</span><span class="p">,</span><span class="w"> </span><span class="nx">newline</span><span class="p">:</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="o">-&gt;</span><span class="w"> </span><span class="nx">Location</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">newline</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">Location</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">index</span><span class="p">:</span><span class="w"> </span><span class="kp">self</span><span class="p">.</span><span class="nx">index</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span> <span class="w"> </span><span class="nx">col</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span> <span class="w"> </span><span class="nx">line</span><span class="p">:</span><span class="w"> </span><span class="kp">self</span><span class="p">.</span><span class="nx">line</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">Location</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">index</span><span class="p">:</span><span class="w"> </span><span class="kp">self</span><span class="p">.</span><span class="nx">index</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span> <span class="w"> </span><span class="nx">col</span><span class="p">:</span><span class="w"> </span><span class="kp">self</span><span class="p">.</span><span class="nx">col</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span> <span class="w"> </span><span class="nx">line</span><span class="p">:</span><span class="w"> </span><span class="kp">self</span><span class="p">.</span><span class="nx">line</span><span class="p">,</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>And the <code>debug</code> function will dump the current line with a pointer in text to the current column along with a message.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="n">pub</span><span class="w"> </span><span class="n">fn</span><span class="w"> </span><span class="n">debug</span><span class="o">&lt;</span><span class="nl">S</span><span class="p">:</span><span class="w"> </span><span class="k">Into</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;&gt;</span><span class="p">(</span><span class="o">&amp;</span><span class="n">self</span><span class="p">,</span><span class="w"> </span><span class="nl">raw</span><span class="p">:</span><span class="w"> </span><span class="o">&amp;[</span><span class="n">char</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="nl">msg</span><span class="p">:</span><span class="w"> </span><span class="n">S</span><span class="p">)</span><span class="w"> </span><span class="o">-&gt;</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="err">{</span> <span class="w"> </span><span class="n">let</span><span class="w"> </span><span class="n">mut</span><span class="w"> </span><span class="n">line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="n">let</span><span class="w"> </span><span class="n">mut</span><span class="w"> </span><span class="n">line_str</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nl">String</span><span class="p">:</span><span class="err">:</span><span class="k">new</span><span class="p">();</span> <span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Find</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">whole</span><span class="w"> </span><span class="n">line</span><span class="w"> </span><span class="k">of</span><span class="w"> </span><span class="n">original</span><span class="w"> </span><span class="n">source</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="ow">in</span><span class="w"> </span><span class="n">raw</span><span class="w"> </span><span class="err">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="o">*</span><span class="n">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s1">&#39;\n&#39;</span><span class="w"> </span><span class="err">{</span> <span class="w"> </span><span class="n">line</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Done</span><span class="w"> </span><span class="n">discovering</span><span class="w"> </span><span class="n">line</span><span class="w"> </span><span class="ow">in</span><span class="w"> </span><span class="n">question</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="err">!</span><span class="n">line_str</span><span class="p">.</span><span class="n">is_empty</span><span class="p">()</span><span class="w"> </span><span class="err">{</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="err">}</span> <span class="w"> </span><span class="k">continue</span><span class="p">;</span> <span class="w"> </span><span class="err">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">self</span><span class="p">.</span><span class="n">line</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">line</span><span class="w"> </span><span class="err">{</span> <span class="w"> </span><span class="n">line_str</span><span class="p">.</span><span class="n">push_str</span><span class="p">(</span><span class="o">&amp;</span><span class="n">c</span><span class="p">.</span><span class="n">to_string</span><span class="p">());</span> <span class="w"> </span><span class="err">}</span> <span class="w"> </span><span class="err">}</span> <span class="w"> </span><span class="n">let</span><span class="w"> </span><span class="nf">space</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ss">&quot; &quot;</span><span class="p">.</span><span class="n">repeat</span><span class="p">(</span><span class="n">self</span><span class="p">.</span><span class="n">col</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">usize</span><span class="p">);</span> <span class="w"> </span><span class="nf">format</span><span class="err">!</span><span class="p">(</span><span class="ss">&quot;{}\n\n{}\n{}^ Near here&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">msg</span><span class="p">.</span><span class="k">into</span><span class="p">(),</span><span class="w"> </span><span class="n">line_str</span><span class="p">,</span><span class="w"> </span><span class="nf">space</span><span class="p">)</span> <span class="w"> </span><span class="err">}</span> <span class="err">}</span> </pre></div> <p>The smallest individual unit after lexical analysis is a token which is either a keyword, number, identifier, operator, or syntax. (This implementation is clearly skipping lots of real Lua syntax like strings.)</p> <div class="highlight"><pre><span></span><span class="cp">#[derive(Debug, PartialEq, Eq, Clone)]</span> <span class="k">pub</span><span class="w"> </span><span class="k">enum</span> <span class="nc">TokenKind</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">Identifier</span><span class="p">,</span> <span class="w"> </span><span class="n">Syntax</span><span class="p">,</span> <span class="w"> </span><span class="n">Keyword</span><span class="p">,</span> <span class="w"> </span><span class="n">Number</span><span class="p">,</span> <span class="w"> </span><span class="n">Operator</span><span class="p">,</span> <span class="p">}</span> <span class="cp">#[derive(Debug, Clone)]</span> <span class="k">pub</span><span class="w"> </span><span class="k">struct</span> <span class="nc">Token</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">value</span>: <span class="nb">String</span><span class="p">,</span> <span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">kind</span>: <span class="nc">TokenKind</span><span class="p">,</span> <span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">loc</span>: <span class="nc">Location</span><span class="p">,</span> <span class="p">}</span> </pre></div> <p>The top-level <code>lex</code> function will iterate over the file and call a lex helper for each kind of token, returning an array of all tokens on success. In between lexing it will "eat whitespace".</p> <div class="highlight"><pre><span></span><span class="k">pub</span><span class="w"> </span><span class="k">fn</span> <span class="nf">lex</span><span class="p">(</span><span class="n">s</span>: <span class="kp">&amp;</span><span class="p">[</span><span class="kt">char</span><span class="p">])</span><span class="w"> </span>-&gt; <span class="nb">Result</span><span class="o">&lt;</span><span class="nb">Vec</span><span class="o">&lt;</span><span class="n">Token</span><span class="o">&gt;</span><span class="p">,</span><span class="w"> </span><span class="nb">String</span><span class="o">&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">loc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Location</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">col</span>: <span class="mi">0</span><span class="p">,</span> <span class="w"> </span><span class="n">index</span>: <span class="mi">0</span><span class="p">,</span> <span class="w"> </span><span class="n">line</span>: <span class="mi">0</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">s</span><span class="p">.</span><span class="n">len</span><span class="p">();</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">tokens</span>: <span class="nb">Vec</span><span class="o">&lt;</span><span class="n">Token</span><span class="o">&gt;</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="fm">vec!</span><span class="p">[];</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">lexers</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span> <span class="w"> </span><span class="n">lex_keyword</span><span class="p">,</span> <span class="w"> </span><span class="n">lex_identifier</span><span class="p">,</span> <span class="w"> </span><span class="n">lex_number</span><span class="p">,</span> <span class="w"> </span><span class="n">lex_syntax</span><span class="p">,</span> <span class="w"> </span><span class="n">lex_operator</span><span class="p">,</span> <span class="w"> </span><span class="p">];</span> <span class="w"> </span><span class="o">&#39;</span><span class="na">outer</span>: <span class="nc">while</span><span class="w"> </span><span class="n">loc</span><span class="p">.</span><span class="n">index</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">loc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">eat_whitespace</span><span class="p">(</span><span class="n">s</span><span class="p">,</span><span class="w"> </span><span class="n">loc</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">loc</span><span class="p">.</span><span class="n">index</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">lexer</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="n">lexers</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">lexer</span><span class="p">(</span><span class="n">s</span><span class="p">,</span><span class="w"> </span><span class="n">loc</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nb">Some</span><span class="p">((</span><span class="n">t</span><span class="p">,</span><span class="w"> </span><span class="n">next_loc</span><span class="p">))</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">loc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">next_loc</span><span class="p">;</span> <span class="w"> </span><span class="n">tokens</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">t</span><span class="p">);</span> <span class="w"> </span><span class="k">continue</span><span class="w"> </span><span class="nl">&#39;outer</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">Err</span><span class="p">(</span><span class="n">loc</span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">s</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Unrecognized character while lexing:&quot;</span><span class="p">));</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nb">Ok</span><span class="p">(</span><span class="n">tokens</span><span class="p">)</span> <span class="p">}</span> </pre></div> <h4 id="whitespace">Whitespace</h4><p>Eating whitespace is just incrementing the location while we see a space, tab, newline, etc.</p> <div class="highlight"><pre><span></span><span class="k">fn</span> <span class="nf">eat_whitespace</span><span class="p">(</span><span class="n">raw</span>: <span class="kp">&amp;</span><span class="p">[</span><span class="kt">char</span><span class="p">],</span><span class="w"> </span><span class="n">initial_loc</span>: <span class="nc">Location</span><span class="p">)</span><span class="w"> </span>-&gt; <span class="nc">Location</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">raw</span><span class="p">[</span><span class="n">initial_loc</span><span class="p">.</span><span class="n">index</span><span class="p">];</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">next_loc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">initial_loc</span><span class="p">;</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">[</span><span class="sc">&#39; &#39;</span><span class="p">,</span><span class="w"> </span><span class="sc">&#39;\n&#39;</span><span class="p">,</span><span class="w"> </span><span class="sc">&#39;\r&#39;</span><span class="p">,</span><span class="w"> </span><span class="sc">&#39;\t&#39;</span><span class="p">].</span><span class="n">contains</span><span class="p">(</span><span class="o">&amp;</span><span class="n">c</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">next_loc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">next_loc</span><span class="p">.</span><span class="n">increment</span><span class="p">(</span><span class="n">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;\n&#39;</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">next_loc</span><span class="p">.</span><span class="n">index</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">raw</span><span class="p">.</span><span class="n">len</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">raw</span><span class="p">[</span><span class="n">next_loc</span><span class="p">.</span><span class="n">index</span><span class="p">];</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">next_loc</span> <span class="p">}</span> </pre></div> <h4 id="numbers">Numbers</h4><p>Lexing numbers iterates through the source starting at a position until it stops seeing decimal digits (this implementation only supports integers).</p> <div class="highlight"><pre><span></span><span class="k">fn</span> <span class="nf">lex_number</span><span class="p">(</span><span class="n">raw</span>: <span class="kp">&amp;</span><span class="p">[</span><span class="kt">char</span><span class="p">],</span><span class="w"> </span><span class="n">initial_loc</span>: <span class="nc">Location</span><span class="p">)</span><span class="w"> </span>-&gt; <span class="nb">Option</span><span class="o">&lt;</span><span class="p">(</span><span class="n">Token</span><span class="p">,</span><span class="w"> </span><span class="n">Location</span><span class="p">)</span><span class="o">&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">ident</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">String</span>::<span class="n">new</span><span class="p">();</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">next_loc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">initial_loc</span><span class="p">;</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">raw</span><span class="p">[</span><span class="n">initial_loc</span><span class="p">.</span><span class="n">index</span><span class="p">];</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">is_digit</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">ident</span><span class="p">.</span><span class="n">push_str</span><span class="p">(</span><span class="o">&amp;</span><span class="n">c</span><span class="p">.</span><span class="n">to_string</span><span class="p">());</span> <span class="w"> </span><span class="n">next_loc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">next_loc</span><span class="p">.</span><span class="n">increment</span><span class="p">(</span><span class="kc">false</span><span class="p">);</span> <span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">raw</span><span class="p">[</span><span class="n">next_loc</span><span class="p">.</span><span class="n">index</span><span class="p">];</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>If there are no digits in the string then this is not a number.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="o">!</span><span class="n">ident</span><span class="p">.</span><span class="n">is_empty</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">Some</span><span class="p">((</span> <span class="w"> </span><span class="n">Token</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">value</span>: <span class="nc">ident</span><span class="p">,</span> <span class="w"> </span><span class="n">loc</span>: <span class="nc">initial_loc</span><span class="p">,</span> <span class="w"> </span><span class="n">kind</span>: <span class="nc">TokenKind</span>::<span class="n">Number</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="n">next_loc</span><span class="p">,</span> <span class="w"> </span><span class="p">))</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">None</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <h4 id="identifiers">Identifiers</h4><p>Identifiers are any collection of alphabet characters, numbers, and underscores.</p> <div class="highlight"><pre><span></span><span class="k">fn</span> <span class="nf">lex_identifier</span><span class="p">(</span><span class="n">raw</span>: <span class="kp">&amp;</span><span class="nb">Vec</span><span class="o">&lt;</span><span class="kt">char</span><span class="o">&gt;</span><span class="p">,</span><span class="w"> </span><span class="n">initial_loc</span>: <span class="nc">Location</span><span class="p">)</span><span class="w"> </span>-&gt; <span class="nb">Option</span><span class="o">&lt;</span><span class="p">(</span><span class="n">Token</span><span class="p">,</span><span class="w"> </span><span class="n">Location</span><span class="p">)</span><span class="o">&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">ident</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">String</span>::<span class="n">new</span><span class="p">();</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">next_loc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">initial_loc</span><span class="p">;</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">raw</span><span class="p">[</span><span class="n">initial_loc</span><span class="p">.</span><span class="n">index</span><span class="p">];</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">is_alphanumeric</span><span class="p">()</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;_&#39;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">ident</span><span class="p">.</span><span class="n">push_str</span><span class="p">(</span><span class="o">&amp;</span><span class="n">c</span><span class="p">.</span><span class="n">to_string</span><span class="p">());</span> <span class="w"> </span><span class="n">next_loc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">next_loc</span><span class="p">.</span><span class="n">increment</span><span class="p">(</span><span class="kc">false</span><span class="p">);</span> <span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">raw</span><span class="p">[</span><span class="n">next_loc</span><span class="p">.</span><span class="n">index</span><span class="p">];</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>But they cannot start with a number.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="c1">// First character must not be a digit</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">ident</span><span class="p">.</span><span class="n">len</span><span class="p">()</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="o">!</span><span class="n">ident</span><span class="p">.</span><span class="n">chars</span><span class="p">().</span><span class="n">next</span><span class="p">().</span><span class="n">unwrap</span><span class="p">().</span><span class="n">is_digit</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">Some</span><span class="p">((</span> <span class="w"> </span><span class="n">Token</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">value</span>: <span class="nc">ident</span><span class="p">,</span> <span class="w"> </span><span class="n">loc</span>: <span class="nc">initial_loc</span><span class="p">,</span> <span class="w"> </span><span class="n">kind</span>: <span class="nc">TokenKind</span>::<span class="n">Identifier</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="n">next_loc</span><span class="p">,</span> <span class="w"> </span><span class="p">))</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">None</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <h4 id="keywords">Keywords</h4><p>Keywords are alphabetical like identifiers are but they cannot be reused as variables by the user.</p> <div class="highlight"><pre><span></span><span class="k">fn</span> <span class="nf">lex_keyword</span><span class="p">(</span><span class="n">raw</span>: <span class="kp">&amp;</span><span class="p">[</span><span class="kt">char</span><span class="p">],</span><span class="w"> </span><span class="n">initial_loc</span>: <span class="nc">Location</span><span class="p">)</span><span class="w"> </span>-&gt; <span class="nb">Option</span><span class="o">&lt;</span><span class="p">(</span><span class="n">Token</span><span class="p">,</span><span class="w"> </span><span class="n">Location</span><span class="p">)</span><span class="o">&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">syntax</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span><span class="s">&quot;function&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;end&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;if&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;then&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;local&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;return&quot;</span><span class="p">];</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">next_loc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">initial_loc</span><span class="p">;</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">String</span>::<span class="n">new</span><span class="p">();</span> <span class="w"> </span><span class="o">&#39;</span><span class="na">outer</span>: <span class="nc">for</span><span class="w"> </span><span class="n">possible_syntax</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="n">syntax</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">raw</span><span class="p">[</span><span class="n">initial_loc</span><span class="p">.</span><span class="n">index</span><span class="p">];</span> <span class="w"> </span><span class="n">next_loc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">initial_loc</span><span class="p">;</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">is_alphanumeric</span><span class="p">()</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;_&#39;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">value</span><span class="p">.</span><span class="n">push_str</span><span class="p">(</span><span class="o">&amp;</span><span class="n">c</span><span class="p">.</span><span class="n">to_string</span><span class="p">());</span> <span class="w"> </span><span class="n">next_loc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">next_loc</span><span class="p">.</span><span class="n">increment</span><span class="p">(</span><span class="kc">false</span><span class="p">);</span> <span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">raw</span><span class="p">[</span><span class="n">next_loc</span><span class="p">.</span><span class="n">index</span><span class="p">];</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">next_loc</span><span class="p">.</span><span class="n">index</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">initial_loc</span><span class="p">.</span><span class="n">index</span><span class="p">;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">possible_syntax</span><span class="p">[</span><span class="o">..</span><span class="n">n</span><span class="p">]</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">String</span>::<span class="n">new</span><span class="p">();</span> <span class="w"> </span><span class="k">continue</span><span class="w"> </span><span class="nl">&#39;outer</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Not a complete match</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">value</span><span class="p">.</span><span class="n">len</span><span class="p">()</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">possible_syntax</span><span class="p">.</span><span class="n">len</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">String</span>::<span class="n">new</span><span class="p">();</span> <span class="w"> </span><span class="k">continue</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// If it got to this point it found a match, so exit early.</span> <span class="w"> </span><span class="c1">// We don&#39;t need a longest match.</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">value</span><span class="p">.</span><span class="n">is_empty</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">None</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Aside from matching a list of strings we have to make sure there is a complete match. For example <code>function1</code> is not a keyword, it's a valid identifier. Whereas <code>function 1</code> is a valid set of tokens (the keyword <code>function</code> and the number <code>1</code>), even if it's not a valid Lua grammar.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="c1">// If the next character would be part of a valid identifier, then</span> <span class="w"> </span><span class="c1">// this is not a keyword.</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">next_loc</span><span class="p">.</span><span class="n">index</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">raw</span><span class="p">.</span><span class="n">len</span><span class="p">()</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">next_c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">raw</span><span class="p">[</span><span class="n">next_loc</span><span class="p">.</span><span class="n">index</span><span class="p">];</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">next_c</span><span class="p">.</span><span class="n">is_alphanumeric</span><span class="p">()</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">next_c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;_&#39;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">None</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nb">Some</span><span class="p">((</span> <span class="w"> </span><span class="n">Token</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">value</span>: <span class="nc">value</span><span class="p">,</span> <span class="w"> </span><span class="n">loc</span>: <span class="nc">initial_loc</span><span class="p">,</span> <span class="w"> </span><span class="n">kind</span>: <span class="nc">TokenKind</span>::<span class="n">Keyword</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="n">next_loc</span><span class="p">,</span> <span class="w"> </span><span class="p">))</span> <span class="p">}</span> </pre></div> <h4 id="syntax">Syntax</h4><p>Syntax (in this context) is just language junk that isn't operators. Things like commas, parenthesis, etc.</p> <div class="highlight"><pre><span></span><span class="k">fn</span> <span class="nf">lex_syntax</span><span class="p">(</span><span class="n">raw</span>: <span class="kp">&amp;</span><span class="p">[</span><span class="kt">char</span><span class="p">],</span><span class="w"> </span><span class="n">initial_loc</span>: <span class="nc">Location</span><span class="p">)</span><span class="w"> </span>-&gt; <span class="nb">Option</span><span class="o">&lt;</span><span class="p">(</span><span class="n">Token</span><span class="p">,</span><span class="w"> </span><span class="n">Location</span><span class="p">)</span><span class="o">&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">syntax</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span><span class="s">&quot;;&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;=&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;(&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;)&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;,&quot;</span><span class="p">];</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">possible_syntax</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="n">syntax</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">raw</span><span class="p">[</span><span class="n">initial_loc</span><span class="p">.</span><span class="n">index</span><span class="p">];</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">next_loc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">initial_loc</span><span class="p">.</span><span class="n">increment</span><span class="p">(</span><span class="kc">false</span><span class="p">);</span> <span class="w"> </span><span class="c1">// TODO: this won&#39;t work with multiple-character syntax bits like &gt;= or ==</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">possible_syntax</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">to_string</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">Some</span><span class="p">((</span> <span class="w"> </span><span class="n">Token</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">value</span>: <span class="nc">possible_syntax</span><span class="p">.</span><span class="n">to_string</span><span class="p">(),</span> <span class="w"> </span><span class="n">loc</span>: <span class="nc">initial_loc</span><span class="p">,</span> <span class="w"> </span><span class="n">kind</span>: <span class="nc">TokenKind</span>::<span class="n">Syntax</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="n">next_loc</span><span class="p">,</span> <span class="w"> </span><span class="p">));</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nb">None</span> <span class="p">}</span> </pre></div> <h4 id="operators">Operators</h4><p>Operators are things like plus, minus, and less than symbols. Operators are syntax but it helps us later on to break these out into a seperate type of token.</p> <div class="highlight"><pre><span></span><span class="k">fn</span> <span class="nf">lex_operator</span><span class="p">(</span><span class="n">raw</span>: <span class="kp">&amp;</span><span class="p">[</span><span class="kt">char</span><span class="p">],</span><span class="w"> </span><span class="n">initial_loc</span>: <span class="nc">Location</span><span class="p">)</span><span class="w"> </span>-&gt; <span class="nb">Option</span><span class="o">&lt;</span><span class="p">(</span><span class="n">Token</span><span class="p">,</span><span class="w"> </span><span class="n">Location</span><span class="p">)</span><span class="o">&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">operators</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span><span class="s">&quot;+&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;-&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;&lt;&quot;</span><span class="p">];</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">possible_syntax</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="n">operators</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">raw</span><span class="p">[</span><span class="n">initial_loc</span><span class="p">.</span><span class="n">index</span><span class="p">];</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">next_loc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">initial_loc</span><span class="p">.</span><span class="n">increment</span><span class="p">(</span><span class="kc">false</span><span class="p">);</span> <span class="w"> </span><span class="c1">// TODO: this won&#39;t work with multiple-character operators like &gt;= or ==</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">possible_syntax</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">c</span><span class="p">.</span><span class="n">to_string</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">Some</span><span class="p">((</span> <span class="w"> </span><span class="n">Token</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">value</span>: <span class="nc">possible_syntax</span><span class="p">.</span><span class="n">to_string</span><span class="p">(),</span> <span class="w"> </span><span class="n">loc</span>: <span class="nc">initial_loc</span><span class="p">,</span> <span class="w"> </span><span class="n">kind</span>: <span class="nc">TokenKind</span>::<span class="n">Operator</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="n">next_loc</span><span class="p">,</span> <span class="w"> </span><span class="p">));</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nb">None</span> <span class="p">}</span> </pre></div> <p>And now we're all done lexing!</p> <h3 id="grammar-analysis">Grammar analysis</h3><p>Parsing finds grammatical (tree) patterns in a flat list of tokens. This is called a syntax tree or abstract syntax tree (AST).</p> <p>The boring part is defining the tree. Generally speaking (and specifically for this project), the syntax tree is a list of statements. Statements can be function definitions or expression statements or if statements or return statements or local declarations.</p> <p>This goes in <code>src/parse.rs</code>.</p> <div class="highlight"><pre><span></span><span class="cp">#[derive(Debug)]</span> <span class="k">pub</span><span class="w"> </span><span class="k">enum</span> <span class="nc">Statement</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">Expression</span><span class="p">(</span><span class="n">Expression</span><span class="p">),</span> <span class="w"> </span><span class="n">If</span><span class="p">(</span><span class="n">If</span><span class="p">),</span> <span class="w"> </span><span class="n">FunctionDeclaration</span><span class="p">(</span><span class="n">FunctionDeclaration</span><span class="p">),</span> <span class="w"> </span><span class="n">Return</span><span class="p">(</span><span class="n">Return</span><span class="p">),</span> <span class="w"> </span><span class="n">Local</span><span class="p">(</span><span class="n">Local</span><span class="p">),</span> <span class="p">}</span> <span class="k">pub</span><span class="w"> </span><span class="k">type</span> <span class="nc">Ast</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">Vec</span><span class="o">&lt;</span><span class="n">Statement</span><span class="o">&gt;</span><span class="p">;</span> </pre></div> <p>There's almost nothing special at all about the rest of the tree definitions.</p> <div class="highlight"><pre><span></span><span class="cp">#[derive(Debug)]</span> <span class="k">pub</span><span class="w"> </span><span class="k">enum</span> <span class="nc">Literal</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">Identifier</span><span class="p">(</span><span class="n">Token</span><span class="p">),</span> <span class="w"> </span><span class="n">Number</span><span class="p">(</span><span class="n">Token</span><span class="p">),</span> <span class="p">}</span> <span class="cp">#[derive(Debug)]</span> <span class="k">pub</span><span class="w"> </span><span class="k">struct</span> <span class="nc">FunctionCall</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">name</span>: <span class="nc">Token</span><span class="p">,</span> <span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">arguments</span>: <span class="nb">Vec</span><span class="o">&lt;</span><span class="n">Expression</span><span class="o">&gt;</span><span class="p">,</span> <span class="p">}</span> <span class="cp">#[derive(Debug)]</span> <span class="k">pub</span><span class="w"> </span><span class="k">struct</span> <span class="nc">BinaryOperation</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">operator</span>: <span class="nc">Token</span><span class="p">,</span> <span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">left</span>: <span class="nb">Box</span><span class="o">&lt;</span><span class="n">Expression</span><span class="o">&gt;</span><span class="p">,</span> <span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">right</span>: <span class="nb">Box</span><span class="o">&lt;</span><span class="n">Expression</span><span class="o">&gt;</span><span class="p">,</span> <span class="p">}</span> <span class="cp">#[derive(Debug)]</span> <span class="k">pub</span><span class="w"> </span><span class="k">enum</span> <span class="nc">Expression</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">FunctionCall</span><span class="p">(</span><span class="n">FunctionCall</span><span class="p">),</span> <span class="w"> </span><span class="n">BinaryOperation</span><span class="p">(</span><span class="n">BinaryOperation</span><span class="p">),</span> <span class="w"> </span><span class="n">Literal</span><span class="p">(</span><span class="n">Literal</span><span class="p">),</span> <span class="p">}</span> <span class="cp">#[derive(Debug)]</span> <span class="k">pub</span><span class="w"> </span><span class="k">struct</span> <span class="nc">FunctionDeclaration</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">name</span>: <span class="nc">Token</span><span class="p">,</span> <span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">parameters</span>: <span class="nb">Vec</span><span class="o">&lt;</span><span class="n">Token</span><span class="o">&gt;</span><span class="p">,</span> <span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">body</span>: <span class="nb">Vec</span><span class="o">&lt;</span><span class="n">Statement</span><span class="o">&gt;</span><span class="p">,</span> <span class="p">}</span> <span class="cp">#[derive(Debug)]</span> <span class="k">pub</span><span class="w"> </span><span class="k">struct</span> <span class="nc">If</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">test</span>: <span class="nc">Expression</span><span class="p">,</span> <span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">body</span>: <span class="nb">Vec</span><span class="o">&lt;</span><span class="n">Statement</span><span class="o">&gt;</span><span class="p">,</span> <span class="p">}</span> <span class="cp">#[derive(Debug)]</span> <span class="k">pub</span><span class="w"> </span><span class="k">struct</span> <span class="nc">Local</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">name</span>: <span class="nc">Token</span><span class="p">,</span> <span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">expression</span>: <span class="nc">Expression</span><span class="p">,</span> <span class="p">}</span> <span class="cp">#[derive(Debug)]</span> <span class="k">pub</span><span class="w"> </span><span class="k">struct</span> <span class="nc">Return</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">pub</span><span class="w"> </span><span class="n">expression</span>: <span class="nc">Expression</span><span class="p">,</span> <span class="p">}</span> </pre></div> <p>And that's it for the AST!</p> <h4 id="some-helpers">Some helpers</h4><p>Lastly before the fun part, we'll define a few helpers for validating each kind of token.</p> <div class="highlight"><pre><span></span><span class="k">fn</span> <span class="nf">expect_keyword</span><span class="p">(</span><span class="n">tokens</span>: <span class="kp">&amp;</span><span class="p">[</span><span class="n">Token</span><span class="p">],</span><span class="w"> </span><span class="n">index</span>: <span class="kt">usize</span><span class="p">,</span><span class="w"> </span><span class="n">value</span>: <span class="kp">&amp;</span><span class="kt">str</span><span class="p">)</span><span class="w"> </span>-&gt; <span class="kt">bool</span> <span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="n">tokens</span><span class="p">.</span><span class="n">len</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">false</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">index</span><span class="p">].</span><span class="n">clone</span><span class="p">();</span> <span class="w"> </span><span class="n">t</span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">TokenKind</span>::<span class="n">Keyword</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="n">t</span><span class="p">.</span><span class="n">value</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">value</span> <span class="p">}</span> <span class="k">fn</span> <span class="nf">expect_syntax</span><span class="p">(</span><span class="n">tokens</span>: <span class="kp">&amp;</span><span class="p">[</span><span class="n">Token</span><span class="p">],</span><span class="w"> </span><span class="n">index</span>: <span class="kt">usize</span><span class="p">,</span><span class="w"> </span><span class="n">value</span>: <span class="kp">&amp;</span><span class="kt">str</span><span class="p">)</span><span class="w"> </span>-&gt; <span class="kt">bool</span> <span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="n">tokens</span><span class="p">.</span><span class="n">len</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">false</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">index</span><span class="p">].</span><span class="n">clone</span><span class="p">();</span> <span class="w"> </span><span class="n">t</span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">TokenKind</span>::<span class="n">Syntax</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="n">t</span><span class="p">.</span><span class="n">value</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">value</span> <span class="p">}</span> <span class="k">fn</span> <span class="nf">expect_identifier</span><span class="p">(</span><span class="n">tokens</span>: <span class="kp">&amp;</span><span class="p">[</span><span class="n">Token</span><span class="p">],</span><span class="w"> </span><span class="n">index</span>: <span class="kt">usize</span><span class="p">)</span><span class="w"> </span>-&gt; <span class="kt">bool</span> <span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="n">tokens</span><span class="p">.</span><span class="n">len</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">false</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">index</span><span class="p">].</span><span class="n">clone</span><span class="p">();</span> <span class="w"> </span><span class="n">t</span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">TokenKind</span>::<span class="n">Identifier</span> <span class="p">}</span> </pre></div> <p>Now on to the fun part, actually detecting these trees!</p> <h4 id="top-level-parse">Top-level parse</h4><p>The top-level <code>parse</code> function and it's major helper, <code>parse_statement</code>, dispatch very similarly to the top-level lex function. For each statement in the file we look for function declarations, if statements, return statements, local declarations, and expression statements.</p> <div class="highlight"><pre><span></span><span class="k">fn</span> <span class="nf">parse_statement</span><span class="p">(</span><span class="n">raw</span>: <span class="kp">&amp;</span><span class="p">[</span><span class="kt">char</span><span class="p">],</span><span class="w"> </span><span class="n">tokens</span>: <span class="kp">&amp;</span><span class="p">[</span><span class="n">Token</span><span class="p">],</span><span class="w"> </span><span class="n">index</span>: <span class="kt">usize</span><span class="p">)</span><span class="w"> </span>-&gt; <span class="nb">Option</span><span class="o">&lt;</span><span class="p">(</span><span class="n">Statement</span><span class="p">,</span><span class="w"> </span><span class="kt">usize</span><span class="p">)</span><span class="o">&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">parsers</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span> <span class="w"> </span><span class="n">parse_if</span><span class="p">,</span> <span class="w"> </span><span class="n">parse_expression_statement</span><span class="p">,</span> <span class="w"> </span><span class="n">parse_return</span><span class="p">,</span> <span class="w"> </span><span class="n">parse_function</span><span class="p">,</span> <span class="w"> </span><span class="n">parse_local</span><span class="p">,</span> <span class="w"> </span><span class="p">];</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">parser</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="n">parsers</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">parser</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">res</span><span class="p">.</span><span class="n">is_some</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">res</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nb">None</span> <span class="p">}</span> <span class="k">pub</span><span class="w"> </span><span class="k">fn</span> <span class="nf">parse</span><span class="p">(</span><span class="n">raw</span>: <span class="kp">&amp;</span><span class="p">[</span><span class="kt">char</span><span class="p">],</span><span class="w"> </span><span class="n">tokens</span>: <span class="nb">Vec</span><span class="o">&lt;</span><span class="n">Token</span><span class="o">&gt;</span><span class="p">)</span><span class="w"> </span>-&gt; <span class="nb">Result</span><span class="o">&lt;</span><span class="n">Ast</span><span class="p">,</span><span class="w"> </span><span class="nb">String</span><span class="o">&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">ast</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="fm">vec!</span><span class="p">[];</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">ntokens</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">.</span><span class="n">len</span><span class="p">();</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">ntokens</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">parse_statement</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nb">Some</span><span class="p">((</span><span class="n">stmt</span><span class="p">,</span><span class="w"> </span><span class="n">next_index</span><span class="p">))</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">next_index</span><span class="p">;</span> <span class="w"> </span><span class="n">ast</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">stmt</span><span class="p">);</span> <span class="w"> </span><span class="k">continue</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">Err</span><span class="p">(</span><span class="n">tokens</span><span class="p">[</span><span class="n">index</span><span class="p">].</span><span class="n">loc</span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Invalid token while parsing:&quot;</span><span class="p">));</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nb">Ok</span><span class="p">(</span><span class="n">ast</span><span class="p">)</span> <span class="p">}</span> </pre></div> <h4 id="expression-statements">Expression statements</h4><p>Expression statements are just a wrapper for the Rust type system. They call <code>parse_expression</code> (which we'll define shortly), expect a semicolon afterward, and wrap the expression in a statement.</p> <div class="highlight"><pre><span></span><span class="k">fn</span> <span class="nf">parse_expression_statement</span><span class="p">(</span> <span class="w"> </span><span class="n">raw</span>: <span class="kp">&amp;</span><span class="p">[</span><span class="kt">char</span><span class="p">],</span> <span class="w"> </span><span class="n">tokens</span>: <span class="kp">&amp;</span><span class="p">[</span><span class="n">Token</span><span class="p">],</span> <span class="w"> </span><span class="n">index</span>: <span class="kt">usize</span><span class="p">,</span> <span class="p">)</span><span class="w"> </span>-&gt; <span class="nb">Option</span><span class="o">&lt;</span><span class="p">(</span><span class="n">Statement</span><span class="p">,</span><span class="w"> </span><span class="kt">usize</span><span class="p">)</span><span class="o">&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">index</span><span class="p">;</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">parse_expression</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">next_index</span><span class="p">)</span><span class="o">?</span><span class="p">;</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="p">(</span><span class="n">expr</span><span class="p">,</span><span class="w"> </span><span class="n">next_next_index</span><span class="p">)</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">res</span><span class="p">;</span> <span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">next_next_index</span><span class="p">;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="o">!</span><span class="n">expect_syntax</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">next_index</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;;&quot;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="fm">println!</span><span class="p">(</span> <span class="w"> </span><span class="s">&quot;{}&quot;</span><span class="p">,</span> <span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">next_index</span><span class="p">]</span> <span class="w"> </span><span class="p">.</span><span class="n">loc</span> <span class="w"> </span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected semicolon after expression:&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">None</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="c1">// Skip past semicolon</span> <span class="w"> </span><span class="nb">Some</span><span class="p">((</span><span class="n">Statement</span>::<span class="n">Expression</span><span class="p">(</span><span class="n">expr</span><span class="p">),</span><span class="w"> </span><span class="n">next_index</span><span class="p">))</span> <span class="p">}</span> </pre></div> <h4 id="expressions">Expressions</h4><p>Expressions in this minimal Lua are only one of function calls, literals (numbers, identifiers), or binary operations. To keep things very simple, binary operations cannot be combined. So instead of <code>1 + 2 + 3</code> we'd need to do <code>local tmp1 = 1 + 2; local tmp2 = tmp1 + 3;</code> and so on.</p> <div class="highlight"><pre><span></span><span class="k">fn</span> <span class="nf">parse_expression</span><span class="p">(</span><span class="n">raw</span>: <span class="kp">&amp;</span><span class="p">[</span><span class="kt">char</span><span class="p">],</span><span class="w"> </span><span class="n">tokens</span>: <span class="kp">&amp;</span><span class="p">[</span><span class="n">Token</span><span class="p">],</span><span class="w"> </span><span class="n">index</span>: <span class="kt">usize</span><span class="p">)</span><span class="w"> </span>-&gt; <span class="nb">Option</span><span class="o">&lt;</span><span class="p">(</span><span class="n">Expression</span><span class="p">,</span><span class="w"> </span><span class="kt">usize</span><span class="p">)</span><span class="o">&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="n">tokens</span><span class="p">.</span><span class="n">len</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">None</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">index</span><span class="p">].</span><span class="n">clone</span><span class="p">();</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">left</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">match</span><span class="w"> </span><span class="n">t</span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">TokenKind</span>::<span class="n">Number</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="n">Expression</span>::<span class="n">Literal</span><span class="p">(</span><span class="n">Literal</span>::<span class="n">Number</span><span class="p">(</span><span class="n">t</span><span class="p">)),</span> <span class="w"> </span><span class="n">TokenKind</span>::<span class="n">Identifier</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="n">Expression</span>::<span class="n">Literal</span><span class="p">(</span><span class="n">Literal</span>::<span class="n">Identifier</span><span class="p">(</span><span class="n">t</span><span class="p">)),</span> <span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">None</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">};</span> </pre></div> <p>If what follows the first literal is an open parenthesis then we try to parse a function call.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">expect_syntax</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">next_index</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;(&quot;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="c1">// Skip past open paren</span> <span class="w"> </span><span class="c1">// Function call</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">arguments</span>: <span class="nb">Vec</span><span class="o">&lt;</span><span class="n">Expression</span><span class="o">&gt;</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="fm">vec!</span><span class="p">[];</span> </pre></div> <p>We need to call <code>parse_expression</code> recursively for every possible argument passed to the function.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="o">!</span><span class="n">expect_syntax</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">next_index</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;)&quot;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">arguments</span><span class="p">.</span><span class="n">is_empty</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="o">!</span><span class="n">expect_syntax</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">next_index</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;,&quot;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="fm">println!</span><span class="p">(</span> <span class="w"> </span><span class="s">&quot;{}&quot;</span><span class="p">,</span> <span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">next_index</span><span class="p">]</span> <span class="w"> </span><span class="p">.</span><span class="n">loc</span> <span class="w"> </span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected comma between function call arguments:&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">None</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="c1">// Skip past comma</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">parse_expression</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">next_index</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nb">Some</span><span class="p">((</span><span class="n">arg</span><span class="p">,</span><span class="w"> </span><span class="n">next_next_index</span><span class="p">))</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">next_next_index</span><span class="p">;</span> <span class="w"> </span><span class="n">arguments</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">arg</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="fm">println!</span><span class="p">(</span> <span class="w"> </span><span class="s">&quot;{}&quot;</span><span class="p">,</span> <span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">next_index</span><span class="p">]</span> <span class="w"> </span><span class="p">.</span><span class="n">loc</span> <span class="w"> </span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected valid expression in function call arguments:&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">None</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="c1">// Skip past closing paren</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">Some</span><span class="p">((</span> <span class="w"> </span><span class="n">Expression</span>::<span class="n">FunctionCall</span><span class="p">(</span><span class="n">FunctionCall</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">name</span>: <span class="nc">tokens</span><span class="p">[</span><span class="n">index</span><span class="p">].</span><span class="n">clone</span><span class="p">(),</span> <span class="w"> </span><span class="n">arguments</span><span class="p">,</span> <span class="w"> </span><span class="p">}),</span> <span class="w"> </span><span class="n">next_index</span><span class="p">,</span> <span class="w"> </span><span class="p">));</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Otherwise if there isn't an opening parenthesis then we could be parsing either a literal expression or a binary operation. If the token that follows is an operator token then we know it's a binary operation.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="c1">// Might be a literal expression</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="n">tokens</span><span class="p">.</span><span class="n">len</span><span class="p">()</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">next_index</span><span class="p">].</span><span class="n">clone</span><span class="p">().</span><span class="n">kind</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">TokenKind</span>::<span class="n">Operator</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">Some</span><span class="p">((</span><span class="n">left</span><span class="p">,</span><span class="w"> </span><span class="n">next_index</span><span class="p">));</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Otherwise is a binary operation</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">op</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">next_index</span><span class="p">].</span><span class="n">clone</span><span class="p">();</span> <span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="c1">// Skip past op</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="n">tokens</span><span class="p">.</span><span class="n">len</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="fm">println!</span><span class="p">(</span> <span class="w"> </span><span class="s">&quot;{}&quot;</span><span class="p">,</span> <span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">next_index</span><span class="p">]</span> <span class="w"> </span><span class="p">.</span><span class="n">loc</span> <span class="w"> </span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected valid right hand side binary operand:&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">None</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">rtoken</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">next_index</span><span class="p">].</span><span class="n">clone</span><span class="p">();</span> </pre></div> <p>It is at this point that we <em>could</em> (but won't) call <code>parse_expression</code> recursively. I don't want to deal with operator precedence right now so we'll just require that the right hand side is another literal.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">right</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">match</span><span class="w"> </span><span class="n">rtoken</span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">TokenKind</span>::<span class="n">Number</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="n">Expression</span>::<span class="n">Literal</span><span class="p">(</span><span class="n">Literal</span>::<span class="n">Number</span><span class="p">(</span><span class="n">rtoken</span><span class="p">)),</span> <span class="w"> </span><span class="n">TokenKind</span>::<span class="n">Identifier</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="n">Expression</span>::<span class="n">Literal</span><span class="p">(</span><span class="n">Literal</span>::<span class="n">Identifier</span><span class="p">(</span><span class="n">rtoken</span><span class="p">)),</span> <span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="fm">println!</span><span class="p">(</span> <span class="w"> </span><span class="s">&quot;{}&quot;</span><span class="p">,</span> <span class="w"> </span><span class="n">rtoken</span> <span class="w"> </span><span class="p">.</span><span class="n">loc</span> <span class="w"> </span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected valid right hand side binary operand:&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">None</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="c1">// Skip past right hand operand</span> <span class="w"> </span><span class="nb">Some</span><span class="p">((</span> <span class="w"> </span><span class="n">Expression</span>::<span class="n">BinaryOperation</span><span class="p">(</span><span class="n">BinaryOperation</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">left</span>: <span class="nb">Box</span>::<span class="n">new</span><span class="p">(</span><span class="n">left</span><span class="p">),</span> <span class="w"> </span><span class="n">right</span>: <span class="nb">Box</span>::<span class="n">new</span><span class="p">(</span><span class="n">right</span><span class="p">),</span> <span class="w"> </span><span class="n">operator</span>: <span class="nc">op</span><span class="p">,</span> <span class="w"> </span><span class="p">}),</span> <span class="w"> </span><span class="n">next_index</span><span class="p">,</span> <span class="w"> </span><span class="p">))</span> <span class="p">}</span> </pre></div> <p>And now we're done parsing expressions!</p> <h4 id="function-declarations">Function declarations</h4><p>Functions start with the <code>function</code> keyword, and an identifier token follows.</p> <div class="highlight"><pre><span></span><span class="k">fn</span> <span class="nf">parse_function</span><span class="p">(</span><span class="n">raw</span>: <span class="kp">&amp;</span><span class="p">[</span><span class="kt">char</span><span class="p">],</span><span class="w"> </span><span class="n">tokens</span>: <span class="kp">&amp;</span><span class="p">[</span><span class="n">Token</span><span class="p">],</span><span class="w"> </span><span class="n">index</span>: <span class="kt">usize</span><span class="p">)</span><span class="w"> </span>-&gt; <span class="nb">Option</span><span class="o">&lt;</span><span class="p">(</span><span class="n">Statement</span><span class="p">,</span><span class="w"> </span><span class="kt">usize</span><span class="p">)</span><span class="o">&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="o">!</span><span class="n">expect_keyword</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;function&quot;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">None</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="o">!</span><span class="n">expect_identifier</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">next_index</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="fm">println!</span><span class="p">(</span> <span class="w"> </span><span class="s">&quot;{}&quot;</span><span class="p">,</span> <span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">next_index</span><span class="p">]</span> <span class="w"> </span><span class="p">.</span><span class="n">loc</span> <span class="w"> </span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected valid identifier for function name:&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">None</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">next_index</span><span class="p">].</span><span class="n">clone</span><span class="p">();</span> </pre></div> <p>After the function name comes the argument list that can be empty or a comma separated list of identifiers.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="c1">// Skip past name</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="o">!</span><span class="n">expect_syntax</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">next_index</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;(&quot;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="fm">println!</span><span class="p">(</span> <span class="w"> </span><span class="s">&quot;{}&quot;</span><span class="p">,</span> <span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">next_index</span><span class="p">]</span> <span class="w"> </span><span class="p">.</span><span class="n">loc</span> <span class="w"> </span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected open parenthesis in function declaration:&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">None</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="c1">// Skip past open paren</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">parameters</span>: <span class="nb">Vec</span><span class="o">&lt;</span><span class="n">Token</span><span class="o">&gt;</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="fm">vec!</span><span class="p">[];</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="o">!</span><span class="n">expect_syntax</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">next_index</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;)&quot;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="o">!</span><span class="n">parameters</span><span class="p">.</span><span class="n">is_empty</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="o">!</span><span class="n">expect_syntax</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">next_index</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;,&quot;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="fm">println!</span><span class="p">(</span><span class="s">&quot;{}&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">next_index</span><span class="p">].</span><span class="n">loc</span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected comma or close parenthesis after parameter in function declaration:&quot;</span><span class="p">));</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">None</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="c1">// Skip past comma</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">parameters</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">tokens</span><span class="p">[</span><span class="n">next_index</span><span class="p">].</span><span class="n">clone</span><span class="p">());</span> <span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="c1">// Skip past param</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="c1">// Skip past close paren</span> </pre></div> <p>Next we parse all statements in the function body until we find the <code>end</code> keyword.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">statements</span>: <span class="nb">Vec</span><span class="o">&lt;</span><span class="n">Statement</span><span class="o">&gt;</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="fm">vec!</span><span class="p">[];</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="o">!</span><span class="n">expect_keyword</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">next_index</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;end&quot;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">parse_statement</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">next_index</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nb">Some</span><span class="p">((</span><span class="n">stmt</span><span class="p">,</span><span class="w"> </span><span class="n">next_next_index</span><span class="p">))</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">next_next_index</span><span class="p">;</span> <span class="w"> </span><span class="n">statements</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">stmt</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="fm">println!</span><span class="p">(</span> <span class="w"> </span><span class="s">&quot;{}&quot;</span><span class="p">,</span> <span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">next_index</span><span class="p">]</span> <span class="w"> </span><span class="p">.</span><span class="n">loc</span> <span class="w"> </span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected valid statement in function declaration:&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">None</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="c1">// Skip past end</span> <span class="w"> </span><span class="nb">Some</span><span class="p">((</span> <span class="w"> </span><span class="n">Statement</span>::<span class="n">FunctionDeclaration</span><span class="p">(</span><span class="n">FunctionDeclaration</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">name</span><span class="p">,</span> <span class="w"> </span><span class="n">parameters</span><span class="p">,</span> <span class="w"> </span><span class="n">body</span>: <span class="nc">statements</span><span class="p">,</span> <span class="w"> </span><span class="p">}),</span> <span class="w"> </span><span class="n">next_index</span><span class="p">,</span> <span class="w"> </span><span class="p">))</span> <span class="p">}</span> </pre></div> <p>Phew! We're halfway through the parser.</p> <h4 id="return-statements">Return statements</h4><p>Return statements just check for the <code>return</code> keyword, an expression, and a semicolon.</p> <div class="highlight"><pre><span></span><span class="k">fn</span> <span class="nf">parse_return</span><span class="p">(</span><span class="n">raw</span>: <span class="kp">&amp;</span><span class="p">[</span><span class="kt">char</span><span class="p">],</span><span class="w"> </span><span class="n">tokens</span>: <span class="kp">&amp;</span><span class="p">[</span><span class="n">Token</span><span class="p">],</span><span class="w"> </span><span class="n">index</span>: <span class="kt">usize</span><span class="p">)</span><span class="w"> </span>-&gt; <span class="nb">Option</span><span class="o">&lt;</span><span class="p">(</span><span class="n">Statement</span><span class="p">,</span><span class="w"> </span><span class="kt">usize</span><span class="p">)</span><span class="o">&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="o">!</span><span class="n">expect_keyword</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;return&quot;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">None</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="c1">// Skip past return</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">parse_expression</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">next_index</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">res</span><span class="p">.</span><span class="n">is_none</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="fm">println!</span><span class="p">(</span> <span class="w"> </span><span class="s">&quot;{}&quot;</span><span class="p">,</span> <span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">next_index</span><span class="p">]</span> <span class="w"> </span><span class="p">.</span><span class="n">loc</span> <span class="w"> </span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected valid expression in return statement:&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">None</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="p">(</span><span class="n">expr</span><span class="p">,</span><span class="w"> </span><span class="n">next_next_index</span><span class="p">)</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">res</span><span class="p">.</span><span class="n">unwrap</span><span class="p">();</span> <span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">next_next_index</span><span class="p">;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="o">!</span><span class="n">expect_syntax</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">next_index</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;;&quot;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="fm">println!</span><span class="p">(</span> <span class="w"> </span><span class="s">&quot;{}&quot;</span><span class="p">,</span> <span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">next_index</span><span class="p">]</span> <span class="w"> </span><span class="p">.</span><span class="n">loc</span> <span class="w"> </span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected semicolon in return statement:&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">None</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="c1">// Skip past semicolon</span> <span class="w"> </span><span class="nb">Some</span><span class="p">((</span><span class="n">Statement</span>::<span class="n">Return</span><span class="p">(</span><span class="n">Return</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">expression</span>: <span class="nc">expr</span><span class="w"> </span><span class="p">}),</span><span class="w"> </span><span class="n">next_index</span><span class="p">))</span> <span class="p">}</span> </pre></div> <h4 id="local-declarations">Local declarations</h4><p>Local declarations start with the <code>local</code> keyword, then the local name, then an equal sign, then an expression, and then a semicolon.</p> <div class="highlight"><pre><span></span><span class="k">fn</span> <span class="nf">parse_local</span><span class="p">(</span><span class="n">raw</span>: <span class="kp">&amp;</span><span class="p">[</span><span class="kt">char</span><span class="p">],</span><span class="w"> </span><span class="n">tokens</span>: <span class="kp">&amp;</span><span class="p">[</span><span class="n">Token</span><span class="p">],</span><span class="w"> </span><span class="n">index</span>: <span class="kt">usize</span><span class="p">)</span><span class="w"> </span>-&gt; <span class="nb">Option</span><span class="o">&lt;</span><span class="p">(</span><span class="n">Statement</span><span class="p">,</span><span class="w"> </span><span class="kt">usize</span><span class="p">)</span><span class="o">&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="o">!</span><span class="n">expect_keyword</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;local&quot;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">None</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="c1">// Skip past local</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="o">!</span><span class="n">expect_identifier</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">next_index</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="fm">println!</span><span class="p">(</span> <span class="w"> </span><span class="s">&quot;{}&quot;</span><span class="p">,</span> <span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">next_index</span><span class="p">]</span> <span class="w"> </span><span class="p">.</span><span class="n">loc</span> <span class="w"> </span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected valid identifier for local name:&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">None</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">next_index</span><span class="p">].</span><span class="n">clone</span><span class="p">();</span> <span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="c1">// Skip past name</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="o">!</span><span class="n">expect_syntax</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">next_index</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;=&quot;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="fm">println!</span><span class="p">(</span> <span class="w"> </span><span class="s">&quot;{}&quot;</span><span class="p">,</span> <span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">next_index</span><span class="p">]</span> <span class="w"> </span><span class="p">.</span><span class="n">loc</span> <span class="w"> </span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected = syntax after local name:&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">None</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="c1">// Skip past =</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">parse_expression</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">next_index</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">res</span><span class="p">.</span><span class="n">is_none</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="fm">println!</span><span class="p">(</span> <span class="w"> </span><span class="s">&quot;{}&quot;</span><span class="p">,</span> <span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">next_index</span><span class="p">]</span> <span class="w"> </span><span class="p">.</span><span class="n">loc</span> <span class="w"> </span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected valid expression in local declaration:&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">None</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="p">(</span><span class="n">expr</span><span class="p">,</span><span class="w"> </span><span class="n">next_next_index</span><span class="p">)</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">res</span><span class="p">.</span><span class="n">unwrap</span><span class="p">();</span> <span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">next_next_index</span><span class="p">;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="o">!</span><span class="n">expect_syntax</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">next_index</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;;&quot;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="fm">println!</span><span class="p">(</span> <span class="w"> </span><span class="s">&quot;{}&quot;</span><span class="p">,</span> <span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">next_index</span><span class="p">]</span> <span class="w"> </span><span class="p">.</span><span class="n">loc</span> <span class="w"> </span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected semicolon in return statement:&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">None</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">next_index</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="c1">// Skip past semicolon</span> <span class="w"> </span><span class="nb">Some</span><span class="p">((</span> <span class="w"> </span><span class="n">Statement</span>::<span class="n">Local</span><span class="p">(</span><span class="n">Local</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">name</span><span class="p">,</span> <span class="w"> </span><span class="n">expression</span>: <span class="nc">expr</span><span class="p">,</span> <span class="w"> </span><span class="p">}),</span> <span class="w"> </span><span class="n">next_index</span><span class="p">,</span> <span class="w"> </span><span class="p">))</span> <span class="p">}</span> </pre></div> <h4 id="if-statements">If statements</h4><p>This implementation of Lua doesn't support <code>elseif</code> so parsing <code>if</code> just checks for the <code>if</code> keyword followed by a test expression, then the <code>else</code> keyword, then the if body (a list of statements), and then the <code>end</code> keyword.</p> <div class="highlight"><pre><span></span><span class="sx">fn</span><span class="w"> </span><span class="nl">parse_if(raw</span><span class="p">:</span><span class="w"> </span><span class="sx">&amp;[char],</span><span class="w"> </span><span class="nl">tokens</span><span class="p">:</span><span class="w"> </span><span class="sx">&amp;[Token],</span><span class="w"> </span><span class="nl">index</span><span class="p">:</span><span class="w"> </span><span class="sx">usize)</span><span class="w"> </span><span class="sx">-&gt;</span><span class="w"> </span><span class="sx">Option&lt;(Statement,</span><span class="w"> </span><span class="sx">usize)&gt;</span><span class="w"> </span><span class="sx">{</span> <span class="w"> </span><span class="sx">if</span><span class="w"> </span><span class="sx">!expect_keyword(tokens,</span><span class="w"> </span><span class="sx">index,</span><span class="w"> </span><span class="s2">&quot;if&quot;</span><span class="sx">)</span><span class="w"> </span><span class="sx">{</span> <span class="w"> </span><span class="kr">return</span><span class="w"> </span><span class="sx">None;</span> <span class="w"> </span><span class="sx">}</span> <span class="w"> </span><span class="sx">let</span><span class="w"> </span><span class="sx">mut</span><span class="w"> </span><span class="sx">next_index</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="sx">index</span><span class="w"> </span><span class="sx">+</span><span class="w"> </span><span class="sx">1;</span><span class="w"> </span><span class="sx">//</span><span class="w"> </span><span class="sx">Skip</span><span class="w"> </span><span class="sx">past</span><span class="w"> </span><span class="sx">if</span> <span class="w"> </span><span class="sx">let</span><span class="w"> </span><span class="sx">res</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="sx">parse_expression(raw,</span><span class="w"> </span><span class="sx">tokens,</span><span class="w"> </span><span class="sx">next_index);</span> <span class="w"> </span><span class="sx">if</span><span class="w"> </span><span class="sx">res.is_none()</span><span class="w"> </span><span class="sx">{</span> <span class="w"> </span><span class="sx">println!(</span> <span class="w"> </span><span class="s2">&quot;{}&quot;</span><span class="sx">,</span> <span class="w"> </span><span class="sx">tokens[next_index]</span> <span class="w"> </span><span class="sx">.loc</span> <span class="w"> </span><span class="sx">.debug(raw,</span><span class="w"> </span><span class="s2">&quot;Expected valid expression for if test:&quot;</span><span class="sx">)</span> <span class="w"> </span><span class="sx">);</span> <span class="w"> </span><span class="kr">return</span><span class="w"> </span><span class="sx">None;</span> <span class="w"> </span><span class="sx">}</span> <span class="w"> </span><span class="sx">let</span><span class="w"> </span><span class="sx">(test,</span><span class="w"> </span><span class="sx">next_next_index)</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="sx">res.unwrap();</span> <span class="w"> </span><span class="sx">next_index</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="sx">next_next_index;</span> <span class="w"> </span><span class="sx">if</span><span class="w"> </span><span class="sx">!expect_keyword(tokens,</span><span class="w"> </span><span class="sx">next_index,</span><span class="w"> </span><span class="s2">&quot;then&quot;</span><span class="sx">)</span><span class="w"> </span><span class="sx">{</span> <span class="w"> </span><span class="kr">return</span><span class="w"> </span><span class="sx">None;</span> <span class="w"> </span><span class="sx">}</span> <span class="w"> </span><span class="sx">next_index</span><span class="w"> </span><span class="sx">+</span><span class="p">=</span><span class="w"> </span><span class="sx">1;</span><span class="w"> </span><span class="sx">//</span><span class="w"> </span><span class="sx">Skip</span><span class="w"> </span><span class="sx">past</span><span class="w"> </span><span class="sx">then</span> <span class="w"> </span><span class="sx">let</span><span class="w"> </span><span class="sx">mut</span><span class="w"> </span><span class="nl">statements</span><span class="p">:</span><span class="w"> </span><span class="sx">Vec&lt;Statement&gt;</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="sx">vec![];</span> <span class="w"> </span><span class="sx">while</span><span class="w"> </span><span class="sx">!expect_keyword(tokens,</span><span class="w"> </span><span class="sx">next_index,</span><span class="w"> </span><span class="s2">&quot;end&quot;</span><span class="sx">)</span><span class="w"> </span><span class="sx">{</span> <span class="w"> </span><span class="sx">let</span><span class="w"> </span><span class="sx">res</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="sx">parse_statement(raw,</span><span class="w"> </span><span class="sx">tokens,</span><span class="w"> </span><span class="sx">next_index);</span> <span class="w"> </span><span class="sx">if</span><span class="w"> </span><span class="sx">let</span><span class="w"> </span><span class="sx">Some((stmt,</span><span class="w"> </span><span class="sx">next_next_index))</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="sx">res</span><span class="w"> </span><span class="sx">{</span> <span class="w"> </span><span class="sx">next_index</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="sx">next_next_index;</span> <span class="w"> </span><span class="sx">statements.push(stmt);</span> <span class="w"> </span><span class="sx">}</span><span class="w"> </span><span class="sx">else</span><span class="w"> </span><span class="sx">{</span> <span class="w"> </span><span class="sx">println!(</span> <span class="w"> </span><span class="s2">&quot;{}&quot;</span><span class="sx">,</span> <span class="w"> </span><span class="sx">tokens[next_index]</span> <span class="w"> </span><span class="sx">.loc</span> <span class="w"> </span><span class="sx">.debug(raw,</span><span class="w"> </span><span class="s2">&quot;Expected valid statement in if body:&quot;</span><span class="sx">)</span> <span class="w"> </span><span class="sx">);</span> <span class="w"> </span><span class="kr">return</span><span class="w"> </span><span class="sx">None;</span> <span class="w"> </span><span class="sx">}</span> <span class="w"> </span><span class="sx">}</span> <span class="w"> </span><span class="sx">next_index</span><span class="w"> </span><span class="sx">+</span><span class="p">=</span><span class="w"> </span><span class="sx">1;</span><span class="w"> </span><span class="sx">//</span><span class="w"> </span><span class="sx">Skip</span><span class="w"> </span><span class="sx">past</span><span class="w"> </span><span class="sx">end</span> <span class="w"> </span><span class="sx">Some((</span> <span class="w"> </span><span class="nl">Statement</span><span class="p">::</span><span class="nl">If(If</span><span class="w"> </span><span class="sx">{</span> <span class="w"> </span><span class="sx">test,</span> <span class="w"> </span><span class="nl">body</span><span class="p">:</span><span class="w"> </span><span class="sx">statements,</span> <span class="w"> </span><span class="sx">}),</span> <span class="w"> </span><span class="sx">next_index,</span> <span class="w"> </span><span class="sx">))</span> <span class="sx">}</span> </pre></div> <p>And goshdarnit we're done parsing.</p> <h3 id="compiling-to-a-made-up-virtual-machine">Compiling to a made up virtual machine</h3><p>This virtual machine will be entirely stack-based other than the stack pointer and program counter.</p> <p>The calling convention is that the caller will put arguments on the stack followed by the frame pointer, the program counter, and then the number of arguments (for cleanup). Then it will alter the program counter and frame pointer. Then the caller will allocate space on the stack for all arguments and locals within the function.</p> <p>For simplicity in addressing modes, the function declaration once jumped to will copy the arguments from before the frame pointer to in front of it (yes I know, I know, this is silly).</p> <p>The virtual machine will support add, subtract, less than operations as well as jump, jump-if-not-zero, return, and call. It will support a few more memory-specific instructions for loading literals, loading identifiers, and managing arguments.</p> <p>I'll explain the non-obvious instructions as we implement them.</p> <div class="highlight"><pre><span></span><span class="k">use</span><span class="w"> </span><span class="k">crate</span>::<span class="n">parse</span>::<span class="o">*</span><span class="p">;</span> <span class="k">use</span><span class="w"> </span><span class="n">std</span>::<span class="n">collections</span>::<span class="n">HashMap</span><span class="p">;</span> <span class="cp">#[derive(Debug)]</span> <span class="k">enum</span> <span class="nc">Instruction</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">DupPlusFP</span><span class="p">(</span><span class="kt">i32</span><span class="p">),</span> <span class="w"> </span><span class="n">MoveMinusFP</span><span class="p">(</span><span class="kt">usize</span><span class="p">,</span><span class="w"> </span><span class="kt">i32</span><span class="p">),</span> <span class="w"> </span><span class="n">MovePlusFP</span><span class="p">(</span><span class="kt">usize</span><span class="p">),</span> <span class="w"> </span><span class="n">Store</span><span class="p">(</span><span class="kt">i32</span><span class="p">),</span> <span class="w"> </span><span class="n">Return</span><span class="p">,</span> <span class="w"> </span><span class="n">JumpIfNotZero</span><span class="p">(</span><span class="nb">String</span><span class="p">),</span> <span class="w"> </span><span class="n">Jump</span><span class="p">(</span><span class="nb">String</span><span class="p">),</span> <span class="w"> </span><span class="n">Call</span><span class="p">(</span><span class="nb">String</span><span class="p">,</span><span class="w"> </span><span class="kt">usize</span><span class="p">),</span> <span class="w"> </span><span class="n">Add</span><span class="p">,</span> <span class="w"> </span><span class="n">Subtract</span><span class="p">,</span> <span class="w"> </span><span class="n">LessThan</span><span class="p">,</span> <span class="p">}</span> </pre></div> <p>The result of compiling will be a <code>Program</code> instance. This instance will contain symbol information and the actual instructions to run.</p> <div class="highlight"><pre><span></span><span class="cp">#[derive(Debug)]</span> <span class="k">struct</span> <span class="nc">Symbol</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">location</span>: <span class="kt">i32</span><span class="p">,</span> <span class="w"> </span><span class="n">narguments</span>: <span class="kt">usize</span><span class="p">,</span> <span class="w"> </span><span class="n">nlocals</span>: <span class="kt">usize</span><span class="p">,</span> <span class="p">}</span> <span class="cp">#[derive(Debug)]</span> <span class="k">pub</span><span class="w"> </span><span class="k">struct</span> <span class="nc">Program</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">syms</span>: <span class="nc">HashMap</span><span class="o">&lt;</span><span class="nb">String</span><span class="p">,</span><span class="w"> </span><span class="n">Symbol</span><span class="o">&gt;</span><span class="p">,</span> <span class="w"> </span><span class="n">instructions</span>: <span class="nb">Vec</span><span class="o">&lt;</span><span class="n">Instruction</span><span class="o">&gt;</span><span class="p">,</span> <span class="p">}</span> </pre></div> <p>Compiling, similar to parsing, just calls the helper <code>compile_statement</code> for each statement in the AST.</p> <div class="highlight"><pre><span></span><span class="k">pub</span><span class="w"> </span><span class="k">fn</span> <span class="nf">compile</span><span class="p">(</span><span class="n">raw</span>: <span class="kp">&amp;</span><span class="p">[</span><span class="kt">char</span><span class="p">],</span><span class="w"> </span><span class="n">ast</span>: <span class="nc">Ast</span><span class="p">)</span><span class="w"> </span>-&gt; <span class="nc">Program</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">locals</span>: <span class="nc">HashMap</span><span class="o">&lt;</span><span class="nb">String</span><span class="p">,</span><span class="w"> </span><span class="kt">i32</span><span class="o">&gt;</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">HashMap</span>::<span class="n">new</span><span class="p">();</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">pgrm</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Program</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">syms</span>: <span class="nc">HashMap</span>::<span class="n">new</span><span class="p">(),</span> <span class="w"> </span><span class="n">instructions</span>: <span class="nb">Vec</span>::<span class="n">new</span><span class="p">(),</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">stmt</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="n">ast</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">compile_statement</span><span class="p">(</span><span class="o">&amp;</span><span class="k">mut</span><span class="w"> </span><span class="n">pgrm</span><span class="p">,</span><span class="w"> </span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="k">mut</span><span class="w"> </span><span class="n">locals</span><span class="p">,</span><span class="w"> </span><span class="n">stmt</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">pgrm</span> <span class="p">}</span> </pre></div> <p>And <code>compile_statement</code> dispatches to additional helpers based on the kind of statement.</p> <div class="highlight"><pre><span></span><span class="k">fn</span> <span class="nf">compile_statement</span><span class="p">(</span> <span class="w"> </span><span class="n">pgrm</span>: <span class="kp">&amp;</span><span class="nc">mut</span><span class="w"> </span><span class="n">Program</span><span class="p">,</span> <span class="w"> </span><span class="n">raw</span>: <span class="kp">&amp;</span><span class="nb">Vec</span><span class="o">&lt;</span><span class="kt">char</span><span class="o">&gt;</span><span class="p">,</span> <span class="w"> </span><span class="n">locals</span>: <span class="kp">&amp;</span><span class="nc">mut</span><span class="w"> </span><span class="n">HashMap</span><span class="o">&lt;</span><span class="nb">String</span><span class="p">,</span><span class="w"> </span><span class="kt">i32</span><span class="o">&gt;</span><span class="p">,</span> <span class="w"> </span><span class="n">stmt</span>: <span class="nc">Statement</span><span class="p">,</span> <span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">match</span><span class="w"> </span><span class="n">stmt</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">Statement</span>::<span class="n">FunctionDeclaration</span><span class="p">(</span><span class="n">fd</span><span class="p">)</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="n">compile_declaration</span><span class="p">(</span><span class="n">pgrm</span><span class="p">,</span><span class="w"> </span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="n">locals</span><span class="p">,</span><span class="w"> </span><span class="n">fd</span><span class="p">),</span> <span class="w"> </span><span class="n">Statement</span>::<span class="n">Return</span><span class="p">(</span><span class="n">r</span><span class="p">)</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="n">compile_return</span><span class="p">(</span><span class="n">pgrm</span><span class="p">,</span><span class="w"> </span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="n">locals</span><span class="p">,</span><span class="w"> </span><span class="n">r</span><span class="p">),</span> <span class="w"> </span><span class="n">Statement</span>::<span class="n">If</span><span class="p">(</span><span class="n">if_</span><span class="p">)</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="n">compile_if</span><span class="p">(</span><span class="n">pgrm</span><span class="p">,</span><span class="w"> </span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="n">locals</span><span class="p">,</span><span class="w"> </span><span class="n">if_</span><span class="p">),</span> <span class="w"> </span><span class="n">Statement</span>::<span class="n">Local</span><span class="p">(</span><span class="n">loc</span><span class="p">)</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="n">compile_local</span><span class="p">(</span><span class="n">pgrm</span><span class="p">,</span><span class="w"> </span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="n">locals</span><span class="p">,</span><span class="w"> </span><span class="n">loc</span><span class="p">),</span> <span class="w"> </span><span class="n">Statement</span>::<span class="n">Expression</span><span class="p">(</span><span class="n">e</span><span class="p">)</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="n">compile_expression</span><span class="p">(</span><span class="n">pgrm</span><span class="p">,</span><span class="w"> </span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="n">locals</span><span class="p">,</span><span class="w"> </span><span class="n">e</span><span class="p">),</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <h4 id="function-declarations">Function declarations</h4><p>Let's do the hard one first. First off, function declarations will include an unconditional guard around them so that we can evaluate from the 0th instruction at the top-level and have only non-function-declaration statements be evaluated.</p> <div class="highlight"><pre><span></span><span class="k">fn</span> <span class="nf">compile_declaration</span><span class="p">(</span> <span class="w"> </span><span class="n">pgrm</span>: <span class="kp">&amp;</span><span class="nc">mut</span><span class="w"> </span><span class="n">Program</span><span class="p">,</span> <span class="w"> </span><span class="n">raw</span>: <span class="kp">&amp;</span><span class="p">[</span><span class="kt">char</span><span class="p">],</span> <span class="w"> </span><span class="n">_</span>: <span class="kp">&amp;</span><span class="nc">mut</span><span class="w"> </span><span class="n">HashMap</span><span class="o">&lt;</span><span class="nb">String</span><span class="p">,</span><span class="w"> </span><span class="kt">i32</span><span class="o">&gt;</span><span class="p">,</span> <span class="w"> </span><span class="n">fd</span>: <span class="nc">FunctionDeclaration</span><span class="p">,</span> <span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Jump to end of function to guard top-level</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">done_label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="fm">format!</span><span class="p">(</span><span class="s">&quot;function_done_{}&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">pgrm</span><span class="p">.</span><span class="n">instructions</span><span class="p">.</span><span class="n">len</span><span class="p">());</span> <span class="w"> </span><span class="n">pgrm</span><span class="p">.</span><span class="n">instructions</span> <span class="w"> </span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">Instruction</span>::<span class="n">Jump</span><span class="p">(</span><span class="n">done_label</span><span class="p">.</span><span class="n">clone</span><span class="p">()));</span> </pre></div> <p>Then we'll add another limitation/simplification that local variables are only accessible within the current function scope.</p> <p>For each parameter, we'll copy the parameter on the stack before the frame pointer to a place in front of the frame pointer. This gets around addressing mode limitations in our virtual machine.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">new_locals</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">HashMap</span>::<span class="o">&lt;</span><span class="nb">String</span><span class="p">,</span><span class="w"> </span><span class="kt">i32</span><span class="o">&gt;</span>::<span class="n">new</span><span class="p">();</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">function_index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pgrm</span><span class="p">.</span><span class="n">instructions</span><span class="p">.</span><span class="n">len</span><span class="p">()</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="kt">i32</span><span class="p">;</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">narguments</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fd</span><span class="p">.</span><span class="n">parameters</span><span class="p">.</span><span class="n">len</span><span class="p">();</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">param</span><span class="p">)</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="n">fd</span><span class="p">.</span><span class="n">parameters</span><span class="p">.</span><span class="n">iter</span><span class="p">().</span><span class="n">enumerate</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">pgrm</span><span class="p">.</span><span class="n">instructions</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">Instruction</span>::<span class="n">MoveMinusFP</span><span class="p">(</span> <span class="w"> </span><span class="n">i</span><span class="p">,</span> <span class="w"> </span><span class="n">narguments</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">),</span> <span class="w"> </span><span class="p">));</span> <span class="w"> </span><span class="n">new_locals</span><span class="p">.</span><span class="n">insert</span><span class="p">(</span><span class="n">param</span><span class="p">.</span><span class="n">value</span><span class="p">.</span><span class="n">clone</span><span class="p">(),</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="kt">i32</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Then we compile the body.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">stmt</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="n">fd</span><span class="p">.</span><span class="n">body</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">compile_statement</span><span class="p">(</span><span class="n">pgrm</span><span class="p">,</span><span class="w"> </span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="k">mut</span><span class="w"> </span><span class="n">new_locals</span><span class="p">,</span><span class="w"> </span><span class="n">stmt</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Once the body is compiled we know the total number of locals so we can fill out the symbol table correctly. The location is importantly already stored because it is the location of the instruction where the function started.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="n">pgrm</span><span class="p">.</span><span class="n">syms</span><span class="p">.</span><span class="n">insert</span><span class="p">(</span> <span class="w"> </span><span class="n">fd</span><span class="p">.</span><span class="n">name</span><span class="p">.</span><span class="n">value</span><span class="p">,</span> <span class="w"> </span><span class="n">Symbol</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">location</span>: <span class="nc">function_index</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="kt">i32</span><span class="p">,</span> <span class="w"> </span><span class="n">narguments</span><span class="p">,</span> <span class="w"> </span><span class="n">nlocals</span>: <span class="nc">new_locals</span><span class="p">.</span><span class="n">keys</span><span class="p">().</span><span class="n">len</span><span class="p">(),</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">);</span> </pre></div> <p>Finally we add a symbol linking the done label for the function to the position of the end of the function. Again, this allows us to skip past the function declaration when evaluating instructions from 0 to N.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="n">pgrm</span><span class="p">.</span><span class="n">syms</span><span class="p">.</span><span class="n">insert</span><span class="p">(</span> <span class="w"> </span><span class="n">done_label</span><span class="p">,</span> <span class="w"> </span><span class="n">Symbol</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">location</span>: <span class="nc">pgrm</span><span class="p">.</span><span class="n">instructions</span><span class="p">.</span><span class="n">len</span><span class="p">()</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="kt">i32</span><span class="p">,</span> <span class="w"> </span><span class="n">narguments</span>: <span class="mi">0</span><span class="p">,</span> <span class="w"> </span><span class="n">nlocals</span>: <span class="mi">0</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">);</span> <span class="p">}</span> </pre></div> <p>Ok that wasn't so bad. And the rest are simpler.</p> <h4 id="local-declarations">Local declarations</h4><p>The expression for the local is compiled and then the local name is stored in a locals table mapped to the current number of locals (including arguments). This allows the compiler to turn <code>identifier</code> token lookups into simply an offset from the frame pointer.</p> <div class="highlight"><pre><span></span><span class="k">fn</span> <span class="nf">compile_local</span><span class="p">(</span> <span class="w"> </span><span class="n">pgrm</span>: <span class="kp">&amp;</span><span class="nc">mut</span><span class="w"> </span><span class="n">Program</span><span class="p">,</span> <span class="w"> </span><span class="n">raw</span>: <span class="kp">&amp;</span><span class="p">[</span><span class="kt">char</span><span class="p">],</span> <span class="w"> </span><span class="n">locals</span>: <span class="kp">&amp;</span><span class="nc">mut</span><span class="w"> </span><span class="n">HashMap</span><span class="o">&lt;</span><span class="nb">String</span><span class="p">,</span><span class="w"> </span><span class="kt">i32</span><span class="o">&gt;</span><span class="p">,</span> <span class="w"> </span><span class="n">local</span>: <span class="nc">Local</span><span class="p">,</span> <span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">locals</span><span class="p">.</span><span class="n">keys</span><span class="p">().</span><span class="n">len</span><span class="p">();</span> <span class="w"> </span><span class="n">locals</span><span class="p">.</span><span class="n">insert</span><span class="p">(</span><span class="n">local</span><span class="p">.</span><span class="n">name</span><span class="p">.</span><span class="n">value</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="kt">i32</span><span class="p">);</span> <span class="w"> </span><span class="n">compile_expression</span><span class="p">(</span><span class="n">pgrm</span><span class="p">,</span><span class="w"> </span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="n">locals</span><span class="p">,</span><span class="w"> </span><span class="n">local</span><span class="p">.</span><span class="n">expression</span><span class="p">);</span> <span class="w"> </span><span class="n">pgrm</span><span class="p">.</span><span class="n">instructions</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">Instruction</span>::<span class="n">MovePlusFP</span><span class="p">(</span><span class="n">index</span><span class="p">));</span> <span class="p">}</span> </pre></div> <p>And specifically, the instruction pattern is to evaluate the expression and then copy it back into a relative position in the stack.</p> <h4 id="literals">Literals</h4><p>Number literals use the <code>store</code> instruction for pushing a number onto the stack. Identifier literals are copied to the top of the stack from their position relative to the frame pointer.</p> <div class="highlight"><pre><span></span><span class="k">fn</span> <span class="nf">compile_literal</span><span class="p">(</span> <span class="w"> </span><span class="n">pgrm</span>: <span class="kp">&amp;</span><span class="nc">mut</span><span class="w"> </span><span class="n">Program</span><span class="p">,</span> <span class="w"> </span><span class="n">_</span>: <span class="kp">&amp;</span><span class="p">[</span><span class="kt">char</span><span class="p">],</span> <span class="w"> </span><span class="n">locals</span>: <span class="kp">&amp;</span><span class="nc">mut</span><span class="w"> </span><span class="n">HashMap</span><span class="o">&lt;</span><span class="nb">String</span><span class="p">,</span><span class="w"> </span><span class="kt">i32</span><span class="o">&gt;</span><span class="p">,</span> <span class="w"> </span><span class="n">lit</span>: <span class="nc">Literal</span><span class="p">,</span> <span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">match</span><span class="w"> </span><span class="n">lit</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">Literal</span>::<span class="n">Number</span><span class="p">(</span><span class="n">i</span><span class="p">)</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="p">.</span><span class="n">value</span><span class="p">.</span><span class="n">parse</span>::<span class="o">&lt;</span><span class="kt">i32</span><span class="o">&gt;</span><span class="p">().</span><span class="n">unwrap</span><span class="p">();</span> <span class="w"> </span><span class="n">pgrm</span><span class="p">.</span><span class="n">instructions</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">Instruction</span>::<span class="n">Store</span><span class="p">(</span><span class="n">n</span><span class="p">));</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">Literal</span>::<span class="n">Identifier</span><span class="p">(</span><span class="n">ident</span><span class="p">)</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">pgrm</span><span class="p">.</span><span class="n">instructions</span> <span class="w"> </span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">Instruction</span>::<span class="n">DupPlusFP</span><span class="p">(</span><span class="n">locals</span><span class="p">[</span><span class="o">&amp;</span><span class="n">ident</span><span class="p">.</span><span class="n">value</span><span class="p">]));</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <h4 id="function-calls">Function calls</h4><p>Pretty simple: just compile all the arguments and then issue a call instruction.</p> <div class="highlight"><pre><span></span><span class="k">fn</span> <span class="nf">compile_function_call</span><span class="p">(</span> <span class="w"> </span><span class="n">pgrm</span>: <span class="kp">&amp;</span><span class="nc">mut</span><span class="w"> </span><span class="n">Program</span><span class="p">,</span> <span class="w"> </span><span class="n">raw</span>: <span class="kp">&amp;</span><span class="nb">Vec</span><span class="o">&lt;</span><span class="kt">char</span><span class="o">&gt;</span><span class="p">,</span> <span class="w"> </span><span class="n">locals</span>: <span class="kp">&amp;</span><span class="nc">mut</span><span class="w"> </span><span class="n">HashMap</span><span class="o">&lt;</span><span class="nb">String</span><span class="p">,</span><span class="w"> </span><span class="kt">i32</span><span class="o">&gt;</span><span class="p">,</span> <span class="w"> </span><span class="n">fc</span>: <span class="nc">FunctionCall</span><span class="p">,</span> <span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">len</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fc</span><span class="p">.</span><span class="n">arguments</span><span class="p">.</span><span class="n">len</span><span class="p">();</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">arg</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="n">fc</span><span class="p">.</span><span class="n">arguments</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">compile_expression</span><span class="p">(</span><span class="n">pgrm</span><span class="p">,</span><span class="w"> </span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="n">locals</span><span class="p">,</span><span class="w"> </span><span class="n">arg</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">pgrm</span><span class="p">.</span><span class="n">instructions</span> <span class="w"> </span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">Instruction</span>::<span class="n">Call</span><span class="p">(</span><span class="n">fc</span><span class="p">.</span><span class="n">name</span><span class="p">.</span><span class="n">value</span><span class="p">,</span><span class="w"> </span><span class="n">len</span><span class="p">));</span> <span class="p">}</span> </pre></div> <h4 id="binary-operations">Binary operations</h4><p>Binary operations compile the left, then the right, and then issue an instruction based on the operator. All the operators are builtin and act on the top two elements on the stack.</p> <div class="highlight"><pre><span></span><span class="k">fn</span> <span class="nf">compile_binary_operation</span><span class="p">(</span> <span class="w"> </span><span class="n">pgrm</span>: <span class="kp">&amp;</span><span class="nc">mut</span><span class="w"> </span><span class="n">Program</span><span class="p">,</span> <span class="w"> </span><span class="n">raw</span>: <span class="kp">&amp;</span><span class="p">[</span><span class="kt">char</span><span class="p">],</span> <span class="w"> </span><span class="n">locals</span>: <span class="kp">&amp;</span><span class="nc">mut</span><span class="w"> </span><span class="n">HashMap</span><span class="o">&lt;</span><span class="nb">String</span><span class="p">,</span><span class="w"> </span><span class="kt">i32</span><span class="o">&gt;</span><span class="p">,</span> <span class="w"> </span><span class="n">bop</span>: <span class="nc">BinaryOperation</span><span class="p">,</span> <span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">compile_expression</span><span class="p">(</span><span class="n">pgrm</span><span class="p">,</span><span class="w"> </span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="n">locals</span><span class="p">,</span><span class="w"> </span><span class="o">*</span><span class="n">bop</span><span class="p">.</span><span class="n">left</span><span class="p">);</span> <span class="w"> </span><span class="n">compile_expression</span><span class="p">(</span><span class="n">pgrm</span><span class="p">,</span><span class="w"> </span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="n">locals</span><span class="p">,</span><span class="w"> </span><span class="o">*</span><span class="n">bop</span><span class="p">.</span><span class="n">right</span><span class="p">);</span> <span class="w"> </span><span class="k">match</span><span class="w"> </span><span class="n">bop</span><span class="p">.</span><span class="n">operator</span><span class="p">.</span><span class="n">value</span><span class="p">.</span><span class="n">as_str</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="s">&quot;+&quot;</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">pgrm</span><span class="p">.</span><span class="n">instructions</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">Instruction</span>::<span class="n">Add</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="s">&quot;-&quot;</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">pgrm</span><span class="p">.</span><span class="n">instructions</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">Instruction</span>::<span class="n">Subtract</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="s">&quot;&lt;&quot;</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">pgrm</span><span class="p">.</span><span class="n">instructions</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">Instruction</span>::<span class="n">LessThan</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="fm">panic!</span><span class="p">(</span> <span class="w"> </span><span class="s">&quot;{}&quot;</span><span class="p">,</span> <span class="w"> </span><span class="n">bop</span><span class="p">.</span><span class="n">operator</span> <span class="w"> </span><span class="p">.</span><span class="n">loc</span> <span class="w"> </span><span class="p">.</span><span class="n">debug</span><span class="p">(</span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Unable to compile binary operation:&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">),</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <h4 id="expressions">Expressions</h4><p>Compiling expressions just dispatches to a compile helper based on the type of expression. We've already written those three helpers.</p> <div class="highlight"><pre><span></span><span class="k">fn</span> <span class="nf">compile_expression</span><span class="p">(</span> <span class="w"> </span><span class="n">pgrm</span>: <span class="kp">&amp;</span><span class="nc">mut</span><span class="w"> </span><span class="n">Program</span><span class="p">,</span> <span class="w"> </span><span class="n">raw</span>: <span class="kp">&amp;</span><span class="p">[</span><span class="kt">char</span><span class="p">],</span> <span class="w"> </span><span class="n">locals</span>: <span class="kp">&amp;</span><span class="nc">mut</span><span class="w"> </span><span class="n">HashMap</span><span class="o">&lt;</span><span class="nb">String</span><span class="p">,</span><span class="w"> </span><span class="kt">i32</span><span class="o">&gt;</span><span class="p">,</span> <span class="w"> </span><span class="n">exp</span>: <span class="nc">Expression</span><span class="p">,</span> <span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">match</span><span class="w"> </span><span class="n">exp</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">Expression</span>::<span class="n">BinaryOperation</span><span class="p">(</span><span class="n">bop</span><span class="p">)</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">compile_binary_operation</span><span class="p">(</span><span class="n">pgrm</span><span class="p">,</span><span class="w"> </span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="n">locals</span><span class="p">,</span><span class="w"> </span><span class="n">bop</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">Expression</span>::<span class="n">FunctionCall</span><span class="p">(</span><span class="n">fc</span><span class="p">)</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">compile_function_call</span><span class="p">(</span><span class="n">pgrm</span><span class="p">,</span><span class="w"> </span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="n">locals</span><span class="p">,</span><span class="w"> </span><span class="n">fc</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">Expression</span>::<span class="n">Literal</span><span class="p">(</span><span class="n">lit</span><span class="p">)</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">compile_literal</span><span class="p">(</span><span class="n">pgrm</span><span class="p">,</span><span class="w"> </span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="n">locals</span><span class="p">,</span><span class="w"> </span><span class="n">lit</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <h4 id="if">If</h4><p>First we compile the conditional test and then we jump to after the if the test result is not zero.</p> <div class="highlight"><pre><span></span><span class="k">fn</span> <span class="nf">compile_if</span><span class="p">(</span><span class="n">pgrm</span>: <span class="kp">&amp;</span><span class="nc">mut</span><span class="w"> </span><span class="n">Program</span><span class="p">,</span><span class="w"> </span><span class="n">raw</span>: <span class="kp">&amp;</span><span class="p">[</span><span class="kt">char</span><span class="p">],</span><span class="w"> </span><span class="n">locals</span>: <span class="kp">&amp;</span><span class="nc">mut</span><span class="w"> </span><span class="n">HashMap</span><span class="o">&lt;</span><span class="nb">String</span><span class="p">,</span><span class="w"> </span><span class="kt">i32</span><span class="o">&gt;</span><span class="p">,</span><span class="w"> </span><span class="n">if_</span>: <span class="nc">If</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">compile_expression</span><span class="p">(</span><span class="n">pgrm</span><span class="p">,</span><span class="w"> </span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="n">locals</span><span class="p">,</span><span class="w"> </span><span class="n">if_</span><span class="p">.</span><span class="n">test</span><span class="p">);</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">done_label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="fm">format!</span><span class="p">(</span><span class="s">&quot;if_else_{}&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">pgrm</span><span class="p">.</span><span class="n">instructions</span><span class="p">.</span><span class="n">len</span><span class="p">());</span> <span class="w"> </span><span class="n">pgrm</span><span class="p">.</span><span class="n">instructions</span> <span class="w"> </span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">Instruction</span>::<span class="n">JumpIfNotZero</span><span class="p">(</span><span class="n">done_label</span><span class="p">.</span><span class="n">clone</span><span class="p">()));</span> </pre></div> <p>Then we compile the body.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">stmt</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="n">if_</span><span class="p">.</span><span class="n">body</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">compile_statement</span><span class="p">(</span><span class="n">pgrm</span><span class="p">,</span><span class="w"> </span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="n">locals</span><span class="p">,</span><span class="w"> </span><span class="n">stmt</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>And finally make sure we insert the <code>done</code> symbol in the right place after the if.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="n">pgrm</span><span class="p">.</span><span class="n">syms</span><span class="p">.</span><span class="n">insert</span><span class="p">(</span> <span class="w"> </span><span class="n">done_label</span><span class="p">,</span> <span class="w"> </span><span class="n">Symbol</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">location</span>: <span class="nc">pgrm</span><span class="p">.</span><span class="n">instructions</span><span class="p">.</span><span class="n">len</span><span class="p">()</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span> <span class="w"> </span><span class="n">nlocals</span>: <span class="mi">0</span><span class="p">,</span> <span class="w"> </span><span class="n">narguments</span>: <span class="mi">0</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">);</span> <span class="p">}</span> </pre></div> <h4 id="return">Return</h4><p>The final statement type is return. We simply compile the return expression and issue a return instruction.</p> <div class="highlight"><pre><span></span><span class="k">fn</span> <span class="nf">compile_return</span><span class="p">(</span> <span class="w"> </span><span class="n">pgrm</span>: <span class="kp">&amp;</span><span class="nc">mut</span><span class="w"> </span><span class="n">Program</span><span class="p">,</span> <span class="w"> </span><span class="n">raw</span>: <span class="kp">&amp;</span><span class="p">[</span><span class="kt">char</span><span class="p">],</span> <span class="w"> </span><span class="n">locals</span>: <span class="kp">&amp;</span><span class="nc">mut</span><span class="w"> </span><span class="n">HashMap</span><span class="o">&lt;</span><span class="nb">String</span><span class="p">,</span><span class="w"> </span><span class="kt">i32</span><span class="o">&gt;</span><span class="p">,</span> <span class="w"> </span><span class="n">ret</span>: <span class="nc">Return</span><span class="p">,</span> <span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">compile_expression</span><span class="p">(</span><span class="n">pgrm</span><span class="p">,</span><span class="w"> </span><span class="n">raw</span><span class="p">,</span><span class="w"> </span><span class="n">locals</span><span class="p">,</span><span class="w"> </span><span class="n">ret</span><span class="p">.</span><span class="n">expression</span><span class="p">);</span> <span class="w"> </span><span class="n">pgrm</span><span class="p">.</span><span class="n">instructions</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">Instruction</span>::<span class="n">Return</span><span class="p">);</span> <span class="p">}</span> </pre></div> <p>That's it for the compiler! Now the trickiest part. I lost a few hours debugging and iterating on the next bit.</p> <h3 id="the-virtual-machine">The virtual machine</h3><p>Ok so the easy part is that there are only two registers, a program counter and a frame pointer. There's also a data stack. The frame pointer points to the location on the data stack where each function can start storing its locals.</p> <p>Evaluation starts from 0 and goes until the last instruction.</p> <div class="highlight"><pre><span></span><span class="k">pub</span><span class="w"> </span><span class="k">fn</span> <span class="nf">eval</span><span class="p">(</span><span class="n">pgrm</span>: <span class="nc">Program</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">pc</span>: <span class="kt">i32</span> <span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">fp</span>: <span class="kt">i32</span> <span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">data</span>: <span class="nb">Vec</span><span class="o">&lt;</span><span class="kt">i32</span><span class="o">&gt;</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="fm">vec!</span><span class="p">[];</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">pgrm</span><span class="p">.</span><span class="n">instructions</span><span class="p">.</span><span class="n">len</span><span class="p">()</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">match</span><span class="w"> </span><span class="o">&amp;</span><span class="n">pgrm</span><span class="p">.</span><span class="n">instructions</span><span class="p">[</span><span class="n">pc</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="kt">usize</span><span class="p">]</span><span class="w"> </span><span class="p">{</span> </pre></div> <p>Each instruction will be responsible for incrementing the program counter or having it jump around.</p> <h4 id="addition,-subtraction,-less-than">Addition, subtraction, less than</h4><p>The easiest ones are the math operators. We just pop off the data stack, perform the operation, and store the result.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="n">Instruction</span>::<span class="n">Add</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">right</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">pop</span><span class="p">().</span><span class="n">unwrap</span><span class="p">();</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">left</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">pop</span><span class="p">().</span><span class="n">unwrap</span><span class="p">();</span> <span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">left</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">right</span><span class="p">);</span> <span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">Instruction</span>::<span class="n">Subtract</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">right</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">pop</span><span class="p">().</span><span class="n">unwrap</span><span class="p">();</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">left</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">pop</span><span class="p">().</span><span class="n">unwrap</span><span class="p">();</span> <span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">left</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">right</span><span class="p">);</span> <span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">Instruction</span>::<span class="n">LessThan</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">right</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">pop</span><span class="p">().</span><span class="n">unwrap</span><span class="p">();</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">left</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">pop</span><span class="p">().</span><span class="n">unwrap</span><span class="p">();</span> <span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="k">if</span><span class="w"> </span><span class="n">left</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">right</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">});</span> <span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>The <code>store</code> instruction is another easy one. It just pushes a literal number onto the stack.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="n">Instruction</span>::<span class="n">Store</span><span class="p">(</span><span class="n">n</span><span class="p">)</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="o">*</span><span class="n">n</span><span class="p">);</span> <span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> </pre></div> <h4 id="jump-variants">Jump variants</h4><p>The jump variants are easy too. Just grab the location and change the program counter. If it's a conditional jump then test the condition first.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="n">Instruction</span>::<span class="n">JumpIfNotZero</span><span class="p">(</span><span class="n">label</span><span class="p">)</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">top</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">pop</span><span class="p">().</span><span class="n">unwrap</span><span class="p">();</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">top</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pgrm</span><span class="p">.</span><span class="n">syms</span><span class="p">[</span><span class="n">label</span><span class="p">].</span><span class="n">location</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">Instruction</span>::<span class="n">Jump</span><span class="p">(</span><span class="n">label</span><span class="p">)</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pgrm</span><span class="p">.</span><span class="n">syms</span><span class="p">[</span><span class="n">label</span><span class="p">].</span><span class="n">location</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> </pre></div> <h4 id="loading-from-a-variable">Loading from a variable</h4><p>The <code>MovePlusFP</code> instruction copies a value from the stack (offset the frame pointer) onto the top of the stack. This is for references to arguments and locals.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="n">Instruction</span>::<span class="n">MovePlusFP</span><span class="p">(</span><span class="n">i</span><span class="p">)</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">val</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">pop</span><span class="p">().</span><span class="n">unwrap</span><span class="p">();</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fp</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="o">*</span><span class="n">i</span><span class="p">;</span> <span class="w"> </span><span class="c1">// Accounts for top-level locals</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">len</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">data</span><span class="p">[</span><span class="n">index</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">val</span><span class="p">;</span> <span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> </pre></div> <h4 id="storing-locals">Storing locals</h4><p>The <code>DupPlusFP</code> instruction is used by <code>compile_locals</code> to store a local once compiled onto the stack in the relative position from the frame pointer.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="n">Instruction</span>::<span class="n">DupPlusFP</span><span class="p">(</span><span class="n">i</span><span class="p">)</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">data</span><span class="p">[(</span><span class="n">fp</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">i</span><span class="p">)</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="kt">usize</span><span class="p">]);</span> <span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> </pre></div> <h4 id="duplicating-arguments">Duplicating arguments</h4><p>The <code>MoveMinusFP</code> instruction is, again, a hack to work around limited addressing modes in this minimal virtual machine. It copies arguments from behind the frame pointer to in front of the frame pointer.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="n">Instruction</span>::<span class="n">MoveMinusFP</span><span class="p">(</span><span class="n">local_offset</span><span class="p">,</span><span class="w"> </span><span class="n">fp_offset</span><span class="p">)</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">data</span><span class="p">[</span><span class="n">fp</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="kt">usize</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">local_offset</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data</span><span class="p">[(</span><span class="n">fp</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="p">(</span><span class="n">fp_offset</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">4</span><span class="p">))</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="kt">usize</span><span class="p">];</span> <span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Now we're down to the last two instructions: call and return.</p> <h4 id="call">Call</h4><p>Call has a special dispatch for builtin functions (the only one that exists being <code>print</code>).</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="n">Instruction</span>::<span class="n">Call</span><span class="p">(</span><span class="n">label</span><span class="p">,</span><span class="w"> </span><span class="n">narguments</span><span class="p">)</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Handle builtin functions</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="n">label</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;print&quot;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">_</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="mi">0</span><span class="o">..*</span><span class="n">narguments</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="fm">print!</span><span class="p">(</span><span class="s">&quot;{}&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">pop</span><span class="p">().</span><span class="n">unwrap</span><span class="p">());</span> <span class="w"> </span><span class="fm">print!</span><span class="p">(</span><span class="s">&quot; &quot;</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="fm">println!</span><span class="p">();</span> <span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="k">continue</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Otherwise it pushes the current frame pointer, then the program counter, and finally the number of arguments (not locals) onto the stack for preservation. Then it sets up the new program counter and frame pointer and creates space for all locals and arguments after the new frame pointer.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">fp</span><span class="p">);</span> <span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">pc</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span> <span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">pgrm</span><span class="p">.</span><span class="n">syms</span><span class="p">[</span><span class="n">label</span><span class="p">].</span><span class="n">narguments</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="kt">i32</span><span class="p">);</span> <span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pgrm</span><span class="p">.</span><span class="n">syms</span><span class="p">[</span><span class="n">label</span><span class="p">].</span><span class="n">location</span><span class="p">;</span> <span class="w"> </span><span class="n">fp</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">len</span><span class="p">()</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="kt">i32</span><span class="p">;</span> <span class="w"> </span><span class="c1">// Set up space for all arguments/locals</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">nlocals</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pgrm</span><span class="p">.</span><span class="n">syms</span><span class="p">[</span><span class="n">label</span><span class="p">].</span><span class="n">nlocals</span><span class="p">;</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="n">nlocals</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span> <span class="w"> </span><span class="n">nlocals</span><span class="w"> </span><span class="o">-=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> </pre></div> <h4 id="return">Return</h4><p>The return instructions pops the return value from the stack. Then it pops off all locals and arguments. Then it restores the program counter and frame pointer, and pops off the arguments before the frame pointer. Finally it adds the return value back onto the stack.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="n">Instruction</span>::<span class="n">Return</span><span class="w"> </span><span class="o">=&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="n">ret</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">pop</span><span class="p">().</span><span class="n">unwrap</span><span class="p">();</span> <span class="w"> </span><span class="c1">// Clean up the local stack</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="n">fp</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">len</span><span class="p">()</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">pop</span><span class="p">();</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Restore pc and fp</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="k">mut</span><span class="w"> </span><span class="n">narguments</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">pop</span><span class="p">().</span><span class="n">unwrap</span><span class="p">();</span> <span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">pop</span><span class="p">().</span><span class="n">unwrap</span><span class="p">();</span> <span class="w"> </span><span class="n">fp</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">pop</span><span class="p">().</span><span class="n">unwrap</span><span class="p">();</span> <span class="w"> </span><span class="c1">// Clean up arguments</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="n">narguments</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">pop</span><span class="p">();</span> <span class="w"> </span><span class="n">narguments</span><span class="w"> </span><span class="o">-=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Add back return value</span> <span class="w"> </span><span class="n">data</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">ret</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>And yes, this implementation would be more efficient if instead of literally pushing and popping we just incremented/decremented a stack pointer.</p> <p>And that's it! We're completely done a basic parser, compiler, and virtual machine for a subet of Lua. Is it janky? Yeah. Is it simple? Kind of? Does it work? It seems to!</p> <h3 id="summary">Summary</h3><p>Ok we've got &lt;1200 lines of Rust enough to run some decent Lua programs. We run this fib program against this implementation and against Lua 5.4.3 (which isn't LuaJIT) and what do we see?</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>cargo<span class="w"> </span>build<span class="w"> </span>--release $<span class="w"> </span>cat<span class="w"> </span>test/fib.lua <span class="k">function</span><span class="w"> </span>fib<span class="o">(</span>n<span class="o">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span>n<span class="w"> </span>&lt;<span class="w"> </span><span class="m">2</span><span class="w"> </span><span class="k">then</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span>n<span class="p">;</span> <span class="w"> </span>end <span class="w"> </span><span class="nb">local</span><span class="w"> </span><span class="nv">n1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>fib<span class="o">(</span>n-1<span class="o">)</span><span class="p">;</span> <span class="w"> </span><span class="nb">local</span><span class="w"> </span><span class="nv">n2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>fib<span class="o">(</span>n-2<span class="o">)</span><span class="p">;</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span>n1<span class="w"> </span>+<span class="w"> </span>n2<span class="p">;</span> end print<span class="o">(</span>fib<span class="o">(</span><span class="m">30</span><span class="o">))</span><span class="p">;</span> $<span class="w"> </span><span class="nb">time</span><span class="w"> </span>./target/release/lust<span class="w"> </span>test/fib.lua <span class="m">832040</span> ./target/release/lust<span class="w"> </span>test/fib.lua<span class="w"> </span><span class="m">0</span>.29s<span class="w"> </span>user<span class="w"> </span><span class="m">0</span>.00s<span class="w"> </span>system<span class="w"> </span><span class="m">99</span>%<span class="w"> </span>cpu<span class="w"> </span><span class="m">0</span>.293<span class="w"> </span>total $<span class="w"> </span><span class="nb">time</span><span class="w"> </span>lua<span class="w"> </span>test/fib.lua <span class="m">832040</span> lua<span class="w"> </span>test/fib.lua<span class="w"> </span><span class="m">0</span>.06s<span class="w"> </span>user<span class="w"> </span><span class="m">0</span>.00s<span class="w"> </span>system<span class="w"> </span><span class="m">99</span>%<span class="w"> </span>cpu<span class="w"> </span><span class="m">0</span>.063<span class="w"> </span>total </pre></div> <p>This implementation is a bit slower! Time to do some profiling and maybe revisit some of those aforementioned inefficiencies.</p> <p class="note"> Big thanks to <a href="https://twitter.com/christianfscott/status/1475832498663792640">Christian Scott on Twitter</a> for pointing out I should not be benchmarking with debug builds! <br /><br /> And thanks to <a href="https://www.reddit.com/r/rust/comments/rqgm8t/comment/hqbwgwj/">reddit123123123123 on Reddit</a> for suggesting I use <code>cargo clippy</code> to clean up my code. <br /><br /> Thanks to <a href="https://github.com/eatonphil/lust/issues/1">GiffE on Github</a> for pointing out some key inconsistencies between this implementation and Lua. I won't modify anything because a perfect Lua subset wasn't the goal, but I'm sharing because it was good analysis and criticism of this implementation. </p><p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I wrote a new blog post on parsing, compiling, and virtual machine evaluation for a super minimal Lua implementation written from scratch in Rust!<a href="https://t.co/8qFviEecJo">https://t.co/8qFviEecJo</a> <a href="https://t.co/d1MGArlErR">pic.twitter.com/d1MGArlErR</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1475828516835008513?ref_src=twsrc%5Etfw">December 28, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/lua-in-rust.htmlTue, 28 Dec 2021 00:00:00 +0000Running SQL Server in a container on Github Actionshttp://notes.eatonphil.com/sqlserver-in-github-actions.html<head> <meta http-equiv="refresh" content="4;URL='https://datastation.multiprocess.io/blog/2021-12-16-sqlserver-in-github-actions.html'" /> </head><p>This is an external post of mine. Click <a href="https://datastation.multiprocess.io/blog/2021-12-16-sqlserver-in-github-actions.html">here</a> if you are not redirected.</p> http://notes.eatonphil.com/sqlserver-in-github-actions.htmlThu, 16 Dec 2021 00:00:00 +0000Implementing zip archiving in Golang: unzippinghttp://notes.eatonphil.com/implementing-zip-in-go-unzipping.html<p><small>All code for this post is <a href="https://github.com/eatonphil/gozip">available on Github</a>.</small></p> <p>Let's take a look at how zip files work. Take a small file for example:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>hello.text Hello! </pre></div> <p>Let's zip it up.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>zip<span class="w"> </span>test.zip<span class="w"> </span>hello.text adding:<span class="w"> </span>hello.text<span class="w"> </span><span class="o">(</span>stored<span class="w"> </span><span class="m">0</span>%<span class="o">)</span> $<span class="w"> </span>ls<span class="w"> </span>-lah<span class="w"> </span>test.zip -rw-r--r--<span class="w"> </span><span class="m">1</span><span class="w"> </span>phil<span class="w"> </span>phil<span class="w"> </span><span class="m">177</span><span class="w"> </span>Nov<span class="w"> </span><span class="m">23</span><span class="w"> </span><span class="m">23</span>:04<span class="w"> </span>test.zip </pre></div> <p>So a 6 byte text file becomes a 177 byte zip file. That is pretty small! Parsing 177 bytes sounds like it can't possibly be too complicated!</p> <p>Let's hexdump the zip file.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>hexdump<span class="w"> </span>-C<span class="w"> </span>test.zip <span class="m">00000000</span><span class="w"> </span><span class="m">50</span><span class="w"> </span>4b<span class="w"> </span><span class="m">03</span><span class="w"> </span><span class="m">04</span><span class="w"> </span>0a<span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span>8a<span class="w"> </span>b8<span class="w"> </span><span class="m">77</span><span class="w"> </span><span class="m">53</span><span class="w"> </span>9e<span class="w"> </span>d8<span class="w"> </span><span class="p">|</span>PK..........wS..<span class="p">|</span> <span class="m">00000010</span><span class="w"> </span><span class="m">42</span><span class="w"> </span>b0<span class="w"> </span><span class="m">07</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">07</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span>0a<span class="w"> </span><span class="m">00</span><span class="w"> </span>1c<span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">68</span><span class="w"> </span><span class="m">65</span><span class="w"> </span><span class="p">|</span>B.............he<span class="p">|</span> <span class="m">00000020</span><span class="w"> </span>6c<span class="w"> </span>6c<span class="w"> </span>6f<span class="w"> </span>2e<span class="w"> </span><span class="m">74</span><span class="w"> </span><span class="m">65</span><span class="w"> </span><span class="m">78</span><span class="w"> </span><span class="m">74</span><span class="w"> </span><span class="m">55</span><span class="w"> </span><span class="m">54</span><span class="w"> </span><span class="m">09</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">03</span><span class="w"> </span><span class="m">74</span><span class="w"> </span><span class="m">73</span><span class="w"> </span>9d<span class="w"> </span><span class="p">|</span>llo.textUT...ts.<span class="p">|</span> <span class="m">00000030</span><span class="w"> </span><span class="m">61</span><span class="w"> </span><span class="m">74</span><span class="w"> </span><span class="m">73</span><span class="w"> </span>9d<span class="w"> </span><span class="m">61</span><span class="w"> </span><span class="m">75</span><span class="w"> </span><span class="m">78</span><span class="w"> </span>0b<span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">01</span><span class="w"> </span><span class="m">04</span><span class="w"> </span>eb<span class="w"> </span><span class="m">03</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">04</span><span class="w"> </span><span class="p">|</span>ats.aux.........<span class="p">|</span> <span class="m">00000040</span><span class="w"> </span>eb<span class="w"> </span><span class="m">03</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">48</span><span class="w"> </span><span class="m">65</span><span class="w"> </span>6c<span class="w"> </span>6c<span class="w"> </span>6f<span class="w"> </span><span class="m">21</span><span class="w"> </span>0a<span class="w"> </span><span class="m">50</span><span class="w"> </span>4b<span class="w"> </span><span class="m">01</span><span class="w"> </span><span class="m">02</span><span class="w"> </span>1e<span class="w"> </span><span class="p">|</span>....Hello!.PK...<span class="p">|</span> <span class="m">00000050</span><span class="w"> </span><span class="m">03</span><span class="w"> </span>0a<span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span>8a<span class="w"> </span>b8<span class="w"> </span><span class="m">77</span><span class="w"> </span><span class="m">53</span><span class="w"> </span>9e<span class="w"> </span>d8<span class="w"> </span><span class="m">42</span><span class="w"> </span>b0<span class="w"> </span><span class="m">07</span><span class="w"> </span><span class="p">|</span>.........wS..B..<span class="p">|</span> <span class="m">00000060</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">07</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span>0a<span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">18</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">01</span><span class="w"> </span><span class="p">|</span>................<span class="p">|</span> <span class="m">00000070</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span>a4<span class="w"> </span><span class="m">81</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">68</span><span class="w"> </span><span class="m">65</span><span class="w"> </span>6c<span class="w"> </span>6c<span class="w"> </span>6f<span class="w"> </span>2e<span class="w"> </span><span class="m">74</span><span class="w"> </span><span class="p">|</span>.........hello.t<span class="p">|</span> <span class="m">00000080</span><span class="w"> </span><span class="m">65</span><span class="w"> </span><span class="m">78</span><span class="w"> </span><span class="m">74</span><span class="w"> </span><span class="m">55</span><span class="w"> </span><span class="m">54</span><span class="w"> </span><span class="m">05</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">03</span><span class="w"> </span><span class="m">74</span><span class="w"> </span><span class="m">73</span><span class="w"> </span>9d<span class="w"> </span><span class="m">61</span><span class="w"> </span><span class="m">75</span><span class="w"> </span><span class="m">78</span><span class="w"> </span>0b<span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="p">|</span>extUT...ts.aux..<span class="p">|</span> <span class="m">00000090</span><span class="w"> </span><span class="m">01</span><span class="w"> </span><span class="m">04</span><span class="w"> </span>eb<span class="w"> </span><span class="m">03</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">04</span><span class="w"> </span>eb<span class="w"> </span><span class="m">03</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">50</span><span class="w"> </span>4b<span class="w"> </span><span class="m">05</span><span class="w"> </span><span class="m">06</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="p">|</span>...........PK...<span class="p">|</span> 000000a0<span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">01</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">01</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">50</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span>4b<span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="p">|</span>.......P...K....<span class="p">|</span> 000000b0<span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="p">|</span>.<span class="p">|</span> 000000b1 </pre></div> <p>We can see both the file name and the file contents in there.</p> <h3 id="structure">Structure</h3><p>Let's take a look at the zip structure defined <a href="https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT">here</a>. Based on section 4.3.6 it looks like file metadata followed by the file contents are stored one after another with a final chunk of "central directory" metadata.</p> <div style="text-align:center"> <img src="https://www.codeproject.com/KB/cs/remotezip/diagram1.png" style="height:400px; width: auto" /> <div> <small><a href="https://www.codeproject.com/Articles/8688/Extracting-files-from-a-remote-ZIP-archive">Image Credit</a></small> </div> </div><p>The local header metadata looks like this:</p> <table> <thead><tr> <th>Field</th> <th>Size</th> </tr> </thead> <tbody> <tr> <td>local file header signature</td> <td>4 bytes</td> </tr> <tr> <td>version needed to extract</td> <td>2 bytes</td> </tr> <tr> <td>general purpose bit flag</td> <td>2 bytes</td> </tr> <tr> <td>compression method</td> <td>2 bytes</td> </tr> <tr> <td>last mod file time</td> <td>2 bytes</td> </tr> <tr> <td>last mod file date</td> <td>2 bytes</td> </tr> <tr> <td>crc-32</td> <td>4 bytes</td> </tr> <tr> <td>compressed size</td> <td>4 bytes</td> </tr> <tr> <td>uncompressed size</td> <td>4 bytes</td> </tr> <tr> <td>file name length</td> <td>2 bytes</td> </tr> <tr> <td>extra field length</td> <td>2 bytes</td> </tr> <tr> <td>file name</td> <td>variable</td> </tr> <tr> <td>extra field</td> <td>variable</td> </tr> </tbody> </table> <p>The header signature is a single integer (<code>0x04034b50</code>) in a valid zip file. We'll ignore version, the general purpose flag, and the checksum. Compression is either <code>0</code> for no compression or <code>8</code> for DEFLATE compression/decompression.</p> <p>Last modified time and date is MSDOS-style date/time format which is <a href="https://groups.google.com/g/comp.os.msdos.programmer/c/ffAVUFN2NbA">pretty funky</a>.</p> <p>Let's translate this roughly to Go with some high level flourishes.</p> <div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span> <span class="kn">import</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="s">&quot;os&quot;</span> <span class="w"> </span><span class="s">&quot;bytes&quot;</span> <span class="w"> </span><span class="s">&quot;compress/flate&quot;</span> <span class="w"> </span><span class="s">&quot;io/ioutil&quot;</span> <span class="w"> </span><span class="s">&quot;encoding/binary&quot;</span> <span class="w"> </span><span class="s">&quot;time&quot;</span> <span class="w"> </span><span class="s">&quot;fmt&quot;</span> <span class="p">)</span> <span class="kd">type</span><span class="w"> </span><span class="nx">compression</span><span class="w"> </span><span class="kt">uint8</span> <span class="kd">const</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="nx">noCompression</span><span class="w"> </span><span class="nx">compression</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">iota</span> <span class="w"> </span><span class="nx">deflateCompression</span> <span class="p">)</span> <span class="kd">type</span><span class="w"> </span><span class="nx">localFileHeader</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">signature</span><span class="w"> </span><span class="kt">uint32</span> <span class="w"> </span><span class="nx">version</span><span class="w"> </span><span class="kt">uint16</span> <span class="w"> </span><span class="nx">bitFlag</span><span class="w"> </span><span class="kt">uint16</span> <span class="w"> </span><span class="nx">compression</span><span class="w"> </span><span class="nx">compression</span> <span class="w"> </span><span class="nx">lastModified</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Time</span> <span class="w"> </span><span class="nx">crc32</span><span class="w"> </span><span class="kt">uint32</span> <span class="w"> </span><span class="nx">compressedSize</span><span class="w"> </span><span class="kt">uint32</span> <span class="w"> </span><span class="nx">uncompressedSize</span><span class="w"> </span><span class="kt">uint32</span> <span class="w"> </span><span class="nx">fileName</span><span class="w"> </span><span class="kt">string</span> <span class="w"> </span><span class="nx">extraField</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span> <span class="w"> </span><span class="nx">fileContents</span><span class="w"> </span><span class="kt">string</span> <span class="p">}</span> </pre></div> <h3 id="main">main</h3><p>Our entrypoint will read a zip file and keep walking through the file until we stop being able to parse zip file entries.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">f</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">ioutil</span><span class="p">.</span><span class="nx">ReadFile</span><span class="p">(</span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">end</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">end</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">f</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="kt">error</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">lfh</span><span class="w"> </span><span class="o">*</span><span class="nx">localFileHeader</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">next</span><span class="w"> </span><span class="kt">int</span> <span class="w"> </span><span class="nx">lfh</span><span class="p">,</span><span class="w"> </span><span class="nx">next</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">parseLocalFileHeader</span><span class="p">(</span><span class="nx">f</span><span class="p">,</span><span class="w"> </span><span class="nx">end</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">errNotZip</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">end</span><span class="w"> </span><span class="p">&gt;</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">break</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">end</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">next</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="nx">lfh</span><span class="p">.</span><span class="nx">lastModified</span><span class="p">,</span><span class="w"> </span><span class="nx">lfh</span><span class="p">.</span><span class="nx">fileName</span><span class="p">,</span><span class="w"> </span><span class="nx">lfh</span><span class="p">.</span><span class="nx">fileContents</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <h3 id="files">Files</h3><p>For each file we'll fail early if the first four bytes are not the magic zip signature.</p> <div class="highlight"><pre><span></span><span class="kd">var</span><span class="w"> </span><span class="nx">errNotZip</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Not a zip file&quot;</span><span class="p">)</span> <span class="kd">func</span><span class="w"> </span><span class="nx">parseLocalFileHeader</span><span class="p">(</span><span class="nx">bs</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="nx">start</span><span class="w"> </span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">localFileHeader</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">signature</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">readUint32</span><span class="p">(</span><span class="nx">bs</span><span class="p">,</span><span class="w"> </span><span class="nx">start</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">signature</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="mh">0x04034b50</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">errNotZip</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>The basic pattern is that one of these read helpers will take an offset and return a Go value and a new offset. The read helper will do bounds checking. We'll define the read helpers further down.</p> <p>Let's follow the same pattern to the end of the struct:</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nv">version</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nf">readUint16</span><span class="p">(</span><span class="nv">bs</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nv">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nv">return</span><span class="w"> </span><span class="nv">nil</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nv">bitFlag</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nf">readUint16</span><span class="p">(</span><span class="nv">bs</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nv">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nv">return</span><span class="w"> </span><span class="nv">nil</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nv">compression</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nv">noCompression</span> <span class="w"> </span><span class="nv">compressionRaw</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nf">readUint16</span><span class="p">(</span><span class="nv">bs</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nv">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nv">return</span><span class="w"> </span><span class="nv">nil</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nv">compressionRaw</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">8</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nv">compression</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nv">deflateCompression</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nv">lmTime</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nf">readUint16</span><span class="p">(</span><span class="nv">bs</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nv">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nv">return</span><span class="w"> </span><span class="nv">nil</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nv">lmDate</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nf">readUint16</span><span class="p">(</span><span class="nv">bs</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nv">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nv">return</span><span class="w"> </span><span class="nv">nil</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nv">lastModified</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nf">msdosTimeToGoTime</span><span class="p">(</span><span class="nv">lmDate</span><span class="p">,</span><span class="w"> </span><span class="nv">lmTime</span><span class="p">)</span> <span class="w"> </span><span class="nv">crc32</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nf">readUint32</span><span class="p">(</span><span class="nv">bs</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nv">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nv">return</span><span class="w"> </span><span class="nv">nil</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nv">compressedSize</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nf">readUint32</span><span class="p">(</span><span class="nv">bs</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nv">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nv">return</span><span class="w"> </span><span class="nv">nil</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nv">uncompressedSize</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nf">readUint32</span><span class="p">(</span><span class="nv">bs</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nv">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nv">return</span><span class="w"> </span><span class="nv">nil</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nv">fileNameLength</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nf">readUint16</span><span class="p">(</span><span class="nv">bs</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nv">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nv">return</span><span class="w"> </span><span class="nv">nil</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nv">extraFieldLength</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nf">readUint16</span><span class="p">(</span><span class="nv">bs</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nv">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nv">return</span><span class="w"> </span><span class="nv">nil</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nv">fileName</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nf">readString</span><span class="p">(</span><span class="nv">bs</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">,</span><span class="w"> </span><span class="nf">int</span><span class="p">(</span><span class="nv">fileNameLength</span><span class="p">))</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nv">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nv">return</span><span class="w"> </span><span class="nv">nil</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nv">extraField</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nf">readBytes</span><span class="p">(</span><span class="nv">bs</span><span class="p">,</span><span class="w"> </span><span class="nv">i</span><span class="p">,</span><span class="w"> </span><span class="nf">int</span><span class="p">(</span><span class="nv">extraFieldLength</span><span class="p">))</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nv">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nv">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nv">return</span><span class="w"> </span><span class="nv">nil</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nv">err</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Now if the file contents are uncompressed we can just copy bytes after the file header. If the file contents are compressed though we'll use Go's builtin DEFLATE support to decompress the bytes after the file header.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">fileContents</span><span class="w"> </span><span class="kt">string</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">compression</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">noCompression</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">fileContents</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">readString</span><span class="p">(</span><span class="nx">bs</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nb">int</span><span class="p">(</span><span class="nx">uncompressedSize</span><span class="p">))</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">end</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nb">int</span><span class="p">(</span><span class="nx">compressedSize</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">end</span><span class="w"> </span><span class="p">&gt;</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">bs</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">errOverranBuffer</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">flateReader</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">flate</span><span class="p">.</span><span class="nx">NewReader</span><span class="p">(</span><span class="nx">bytes</span><span class="p">.</span><span class="nx">NewReader</span><span class="p">(</span><span class="nx">bs</span><span class="p">[</span><span class="nx">i</span><span class="p">:</span><span class="nx">end</span><span class="p">]))</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">flateReader</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span> <span class="w"> </span><span class="nx">read</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">ioutil</span><span class="p">.</span><span class="nx">ReadAll</span><span class="p">(</span><span class="nx">flateReader</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">fileContents</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">read</span><span class="p">)</span> <span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">end</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>And return the filled out representation:</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">localFileHeader</span><span class="p">{</span> <span class="w"> </span><span class="nx">signature</span><span class="p">:</span><span class="w"> </span><span class="nx">signature</span><span class="p">,</span> <span class="w"> </span><span class="nx">version</span><span class="p">:</span><span class="w"> </span><span class="nx">version</span><span class="p">,</span> <span class="w"> </span><span class="nx">bitFlag</span><span class="p">:</span><span class="w"> </span><span class="nx">bitFlag</span><span class="p">,</span> <span class="w"> </span><span class="nx">compression</span><span class="p">:</span><span class="w"> </span><span class="nx">compression</span><span class="p">,</span> <span class="w"> </span><span class="nx">lastModified</span><span class="p">:</span><span class="w"> </span><span class="nx">lastModified</span><span class="p">,</span> <span class="w"> </span><span class="nx">crc32</span><span class="p">:</span><span class="w"> </span><span class="nx">crc32</span><span class="p">,</span> <span class="w"> </span><span class="nx">compressedSize</span><span class="p">:</span><span class="w"> </span><span class="nx">compressedSize</span><span class="p">,</span> <span class="w"> </span><span class="nx">uncompressedSize</span><span class="p">:</span><span class="w"> </span><span class="nx">uncompressedSize</span><span class="p">,</span> <span class="w"> </span><span class="nx">fileName</span><span class="p">:</span><span class="w"> </span><span class="nx">fileName</span><span class="p">,</span> <span class="w"> </span><span class="nx">extraField</span><span class="p">:</span><span class="w"> </span><span class="nx">extraField</span><span class="p">,</span> <span class="w"> </span><span class="nx">fileContents</span><span class="p">:</span><span class="w"> </span><span class="nx">fileContents</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <h3 id="read-helpers">Read helpers</h3><p>Now we just define those read helpers with bounds checking, using Go's builtin libraries for dealing with binary encodings.</p> <div class="highlight"><pre><span></span><span class="kd">var</span><span class="w"> </span><span class="nx">errOverranBuffer</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Overran buffer&quot;</span><span class="p">)</span> <span class="kd">func</span><span class="w"> </span><span class="nx">readUint32</span><span class="p">(</span><span class="nx">bs</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="nx">offset</span><span class="w"> </span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="kt">uint32</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">end</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">offset</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">4</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">end</span><span class="w"> </span><span class="p">&gt;</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">bs</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">errOverranBuffer</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">LittleEndian</span><span class="p">.</span><span class="nx">Uint32</span><span class="p">(</span><span class="nx">bs</span><span class="p">[</span><span class="nx">offset</span><span class="p">:</span><span class="nx">end</span><span class="p">]),</span><span class="w"> </span><span class="nx">end</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">readUint16</span><span class="p">(</span><span class="nx">bs</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="nx">offset</span><span class="w"> </span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="kt">uint16</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">end</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">offset</span><span class="o">+</span><span class="mi">2</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">end</span><span class="w"> </span><span class="p">&gt;</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">bs</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">errOverranBuffer</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">LittleEndian</span><span class="p">.</span><span class="nx">Uint16</span><span class="p">(</span><span class="nx">bs</span><span class="p">[</span><span class="nx">offset</span><span class="p">:</span><span class="nx">end</span><span class="p">]),</span><span class="w"> </span><span class="nx">end</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <p>And basically only bounds checking for grabbing bytes and strings.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">readBytes</span><span class="p">(</span><span class="nx">bs</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="nx">offset</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="nx">n</span><span class="w"> </span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="p">([]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">end</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">offset</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">n</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">end</span><span class="w"> </span><span class="p">&gt;</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">bs</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">errOverranBuffer</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">bs</span><span class="p">[</span><span class="nx">offset</span><span class="p">:</span><span class="nx">offset</span><span class="o">+</span><span class="nx">n</span><span class="p">],</span><span class="w"> </span><span class="nx">end</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">readString</span><span class="p">(</span><span class="nx">bs</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="nx">offset</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="nx">n</span><span class="w"> </span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">read</span><span class="p">,</span><span class="w"> </span><span class="nx">end</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">readBytes</span><span class="p">(</span><span class="nx">bs</span><span class="p">,</span><span class="w"> </span><span class="nx">offset</span><span class="p">,</span><span class="w"> </span><span class="nx">n</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">read</span><span class="p">),</span><span class="w"> </span><span class="nx">end</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="p">}</span> </pre></div> <h3 id="msdos-time">MSDOS time</h3><p>At the time zip was created, MSDOS time format was popular, I guess. But it's not popular today so it took a bit of work to finally find <a href="https://groups.google.com/g/comp.os.msdos.programmer/c/ffAVUFN2NbA">an explanation of the format</a> with some code (in C).</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">msdosTimeToGoTime</span><span class="p">(</span><span class="nx">d</span><span class="w"> </span><span class="kt">uint16</span><span class="p">,</span><span class="w"> </span><span class="nx">t</span><span class="w"> </span><span class="kt">uint16</span><span class="p">)</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Time</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">seconds</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">int</span><span class="p">((</span><span class="nx">t</span><span class="w"> </span><span class="o">&amp;</span><span class="w"> </span><span class="mh">0x1F</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">2</span><span class="p">)</span> <span class="w"> </span><span class="nx">minutes</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">int</span><span class="p">((</span><span class="nx">t</span><span class="w"> </span><span class="o">&gt;&gt;</span><span class="w"> </span><span class="mi">5</span><span class="p">)</span><span class="w"> </span><span class="o">&amp;</span><span class="w"> </span><span class="mh">0x3F</span><span class="p">)</span> <span class="w"> </span><span class="nx">hours</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">int</span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="o">&gt;&gt;</span><span class="w"> </span><span class="mi">11</span><span class="p">)</span> <span class="w"> </span><span class="nx">day</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">int</span><span class="p">(</span><span class="nx">d</span><span class="w"> </span><span class="o">&amp;</span><span class="w"> </span><span class="mh">0x1F</span><span class="p">)</span> <span class="w"> </span><span class="nx">month</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Month</span><span class="p">((</span><span class="nx">d</span><span class="w"> </span><span class="o">&gt;&gt;</span><span class="w"> </span><span class="mi">5</span><span class="p">)</span><span class="w"> </span><span class="o">&amp;</span><span class="w"> </span><span class="mh">0x0F</span><span class="p">)</span> <span class="w"> </span><span class="nx">year</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">int</span><span class="p">((</span><span class="nx">d</span><span class="w"> </span><span class="o">&gt;&gt;</span><span class="w"> </span><span class="mi">9</span><span class="p">)</span><span class="w"> </span><span class="o">&amp;</span><span class="w"> </span><span class="mh">0x7F</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1980</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Date</span><span class="p">(</span><span class="nx">year</span><span class="p">,</span><span class="w"> </span><span class="nx">month</span><span class="p">,</span><span class="w"> </span><span class="nx">day</span><span class="p">,</span><span class="w"> </span><span class="nx">hours</span><span class="p">,</span><span class="w"> </span><span class="nx">minutes</span><span class="p">,</span><span class="w"> </span><span class="nx">seconds</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Local</span><span class="p">)</span> <span class="p">}</span> </pre></div> <h3 id="tout-ensemble">Tout ensemble</h3><p>Running it we get:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>build $<span class="w"> </span>./gozip<span class="w"> </span>test.zip <span class="m">2021</span>-11-23<span class="w"> </span><span class="m">23</span>:04:20<span class="w"> </span>+0000<span class="w"> </span>UTC<span class="w"> </span>hello.text<span class="w"> </span>Hello! </pre></div> <p>That looks good! Now let's try zipping more than one file.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>bye.text Au<span class="w"> </span>revoir! $<span class="w"> </span>rm<span class="w"> </span>test.zip $<span class="w"> </span>zip<span class="w"> </span>test.zip<span class="w"> </span>*.text <span class="w"> </span>adding:<span class="w"> </span>bye.text<span class="w"> </span><span class="o">(</span>stored<span class="w"> </span><span class="m">0</span>%<span class="o">)</span> <span class="w"> </span>adding:<span class="w"> </span>hello.text<span class="w"> </span><span class="o">(</span>stored<span class="w"> </span><span class="m">0</span>%<span class="o">)</span> $<span class="w"> </span>./gozip<span class="w"> </span>test.zip <span class="m">2021</span>-11-24<span class="w"> </span><span class="m">03</span>:40:00<span class="w"> </span>+0000<span class="w"> </span>UTC<span class="w"> </span>bye.text<span class="w"> </span>Au<span class="w"> </span>revoir! <span class="m">2021</span>-11-23<span class="w"> </span><span class="m">23</span>:04:20<span class="w"> </span>+0000<span class="w"> </span>UTC<span class="w"> </span>hello.text<span class="w"> </span>Hello! </pre></div> <p>Fab.</p> <h3 id="notes">Notes</h3><p>There are many parts of the standard to deal with (e.g. directories) and many common extensions. I'm ignoring them.</p> <p>There's some space left at the end of the file which is probably the "central directory" metadata but I haven't dug into that. Understanding those last remaining bits are probably necessary if I want to be able to <em>create</em> zip archives.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I wrote a new post on building a zip archive reader in Go!<a href="https://t.co/U0Yg2powlP">https://t.co/U0Yg2powlP</a> <a href="https://t.co/ns5dF3mjIx">pic.twitter.com/ns5dF3mjIx</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1463354752675323904?ref_src=twsrc%5Etfw">November 24, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/implementing-zip-in-go-unzipping.htmlTue, 23 Nov 2021 00:00:00 +0000Benchmarking esbuild, swc, tsc, and babel for React/JSX projectshttp://notes.eatonphil.com/benchmarking-esbuild-swc-typescript-babel.html<head> <meta http-equiv="refresh" content="4;URL='https://datastation.multiprocess.io/blog/2021-11-13-benchmarking-esbuild-swc-typescript-babel.html'" /> </head><p>This is an external post of mine. Click <a href="https://datastation.multiprocess.io/blog/2021-11-13-benchmarking-esbuild-swc-typescript-babel.html">here</a> if you are not redirected.</p> http://notes.eatonphil.com/benchmarking-esbuild-swc-typescript-babel.htmlSat, 13 Nov 2021 00:00:00 +0000Building a fast SCSS-like rule expander for CSS using fuzzy parsinghttp://notes.eatonphil.com/building-a-nested-css-rule-expander.html<head> <meta http-equiv="refresh" content="4;URL='https://datastation.multiprocess.io/blog/2021-10-31-building-a-nested-css-rule-expander.html'" /> </head><p>This is an external post of mine. Click <a href="https://datastation.multiprocess.io/blog/2021-10-31-building-a-nested-css-rule-expander.html">here</a> if you are not redirected.</p> http://notes.eatonphil.com/building-a-nested-css-rule-expander.htmlSun, 31 Oct 2021 00:00:00 +0000Exploring PL/pgSQL part two: implementing a Forth-like interpreterhttp://notes.eatonphil.com/exploring-plpgsql-forth-like.html<p class="note"> Previously in exploring PL/pgSQL: <br /> <a href="exploring-plpgsql.html">Strings, arrays, recursion and parsing JSON</a> </p><p>In my <a href="https://notes.eatonphil.com/exploring-plpgsql.html">last post</a> I walked through the basics of PL/pgSQL, the embedded procedural language inside of PostgreSQL. It covered simple functions, recursions and parsing. But there was something very obviously missing from that post: a working interpreter.</p> <p>So in this post we'll walk through building a Forth-like language from scratch in PL/pgSQL. We'll be able to write a fibonacci function in this Forth-like language and have it be evaluated correctly like so:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>./test.sh<span class="w"> </span>sm.sql<span class="w"> </span><span class="s2">&quot;SELECT sm_run(&#39;</span> <span class="s2">DEF fib</span> <span class="s2"> DUP 1 &gt; IF</span> <span class="s2"> 1- DUP 1- fib CALL SWAP fib CALL + THEN</span> <span class="s2"> RET</span> <span class="s2">20 fib CALL</span> <span class="s2">EXIT&#39;)&quot;</span> ... <span class="w"> </span>sm_run -------- <span class="w"> </span><span class="m">6765</span> <span class="o">(</span><span class="m">1</span><span class="w"> </span>row<span class="o">)</span> </pre></div> <p>All code is available on <a href="https://github.com/eatonphil/exploring-plpgsql/blob/main/sm.sql">Github</a>.</p> <h3 id="forth">Forth</h3><p><a href="https://www.forth.com/resources/forth-programming-language/">Forth</a> is a stack-oriented language. Literals are pushed onto the stack. Functions and builtins operate on the stack.</p> <p>For example:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>./test.sh<span class="w"> </span>sm.sql<span class="w"> </span><span class="s2">&quot;SELECT sm_run(&#39;3 2 + EXIT&#39;)&quot;</span> </pre></div> <p>Will produce <code>5</code>. And:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>./test.sh<span class="w"> </span>sm.sql<span class="w"> </span><span class="s2">&quot;SELECT sm_run(&#39;3 2 + 1 - EXIT&#39;)&quot;</span> </pre></div> <p>Will produce <code>4</code>.</p> <p>Our code will notably not be a real Forth, since there are many special features of a real Forth. But it will look like one to a novice Forth programmer like myself.</p> <p>You can read more about Forth basics <a href="https://skilldrick.github.io/easyforth/">here</a>. And you can read a truly stunning, real Forth implementation in <a href="https://github.com/nornagon/jonesforth/blob/master/jonesforth.S">jonesforth.S</a>. Or you can pick up <a href="https://letoverlambda.com/">Let Over Lambda</a> for a fantastic book on Common Lisp that culminates in a Forth interpreter.</p> <h3 id="implementation">Implementation</h3><p>Since the builtin <code>array_length($arr, $dim)</code> returns <code>NULL</code> if the array is <code>NULL</code> and our dimension is always 1, we'll write a helper.</p> <div class="highlight"><pre><span></span><span class="k">DROP</span><span class="w"> </span><span class="k">FUNCTION</span><span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="k">EXISTS</span><span class="w"> </span><span class="n">sm_alength</span><span class="p">;</span> <span class="k">CREATE</span><span class="w"> </span><span class="k">FUNCTION</span><span class="w"> </span><span class="n">sm_alength</span><span class="p">(</span><span class="n">a</span><span class="w"> </span><span class="nb">text</span><span class="p">[])</span><span class="w"> </span><span class="k">RETURNS</span><span class="w"> </span><span class="nb">int</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="err">$$</span> <span class="k">BEGIN</span> <span class="w"> </span><span class="k">RETURN</span><span class="w"> </span><span class="k">COALESCE</span><span class="p">(</span><span class="n">array_length</span><span class="p">(</span><span class="n">a</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">),</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span> <span class="k">END</span><span class="p">;</span> <span class="err">$$</span><span class="w"> </span><span class="k">LANGUAGE</span><span class="w"> </span><span class="n">plpgsql</span><span class="p">;</span> </pre></div> <p>We'll also need to bring in the <code>hstore</code> extension so we can map function names to their positions. (We could use an association list but those are less programmer-friendly.)</p> <div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="n">EXTENSION</span><span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">EXISTS</span><span class="w"> </span><span class="n">hstore</span><span class="p">;</span> </pre></div> <p>Our interpreter function will take a string to evaluate, splitting the string on whitespace into tokens.</p> <div class="highlight"><pre><span></span><span class="k">DROP</span><span class="w"> </span><span class="k">FUNCTION</span><span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="k">EXISTS</span><span class="w"> </span><span class="n">sm_run</span><span class="p">;</span> <span class="k">CREATE</span><span class="w"> </span><span class="k">FUNCTION</span><span class="w"> </span><span class="n">sm_run</span><span class="p">(</span><span class="n">s</span><span class="w"> </span><span class="nb">text</span><span class="p">)</span><span class="w"> </span><span class="k">RETURNS</span><span class="w"> </span><span class="nb">TEXT</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="err">$$</span> <span class="k">DECLARE</span> <span class="w"> </span><span class="n">tokens</span><span class="w"> </span><span class="nb">text</span><span class="p">[]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">regexp_split_to_array</span><span class="p">(</span><span class="n">s</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;\s+&#39;</span><span class="p">);</span> <span class="w"> </span><span class="n">stack</span><span class="w"> </span><span class="nb">text</span><span class="p">[];</span><span class="w"> </span><span class="c1">-- Data stack</span> <span class="w"> </span><span class="n">defs</span><span class="w"> </span><span class="n">hstore</span><span class="p">;</span><span class="w"> </span><span class="c1">-- Map of functions to location</span> <span class="w"> </span><span class="n">tmps</span><span class="w"> </span><span class="nb">text</span><span class="p">[];</span><span class="w"> </span><span class="c1">-- Array we can use for temporary variables</span> <span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="nb">text</span><span class="p">;</span><span class="w"> </span><span class="c1">-- Current token</span> <span class="w"> </span><span class="n">rps</span><span class="w"> </span><span class="nb">text</span><span class="p">[];</span><span class="w"> </span><span class="c1">-- Return pointer stack, always ints but easier to store as text</span> <span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="nb">int</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="c1">-- Program counter</span> <span class="k">BEGIN</span> </pre></div> <p>We set up a <code>tmps</code> array because each builtin may need differing number of temporary variables and PL/pgSQL makes ad-hoc variables cumbersome (or at least an easier way exists outside my knowledge).</p> <p>And we store the return pointer stack as a text array so that we can use <code>sm_alength</code> on it even though values in this array will always be integers.</p> <p>Next we'll start an infinite loop to evaluate the program. The only thing that will stop the input is the <code>EXIT</code> builtin that will return from this function with the top of the stack.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="n">WHILE</span><span class="w"> </span><span class="k">true</span><span class="w"> </span><span class="n">LOOP</span> <span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">pc</span><span class="p">];</span> <span class="w"> </span><span class="n">RAISE</span><span class="w"> </span><span class="n">NOTICE</span><span class="w"> </span><span class="s1">&#39;[Debug] Current token: %. Current stack: %.&#39;</span><span class="p">,</span><span class="w"> </span><span class="n">token</span><span class="p">,</span><span class="w"> </span><span class="n">stack</span><span class="p">;</span> <span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="k">IS</span><span class="w"> </span><span class="k">NULL</span><span class="w"> </span><span class="k">THEN</span> <span class="w"> </span><span class="n">RAISE</span><span class="w"> </span><span class="k">EXCEPTION</span><span class="w"> </span><span class="s1">&#39;PC out of bounds.&#39;</span><span class="p">;</span> <span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span> <span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;EXIT&#39;</span><span class="w"> </span><span class="k">THEN</span> <span class="w"> </span><span class="k">RETURN</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)];</span> <span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span> <span class="w"> </span><span class="p">...</span><span class="w"> </span><span class="n">TODO</span><span class="w"> </span><span class="p">...</span> <span class="w"> </span><span class="n">stack</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">array_append</span><span class="p">(</span><span class="n">stack</span><span class="p">,</span><span class="w"> </span><span class="n">token</span><span class="p">);</span> <span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="n">LOOP</span><span class="p">;</span> <span class="k">END</span><span class="p">;</span> <span class="err">$$</span><span class="w"> </span><span class="k">LANGUAGE</span><span class="w"> </span><span class="n">plpgsql</span><span class="p">;</span> </pre></div> <p>If no other condition is met (the token is not a builtin), we push it onto the data stack and increment the program counter.</p> <h3 id="conditionals">Conditionals</h3><p>The <code>IF</code> builtin pops the top of the stack. If it is true evaluation continues. If it is false evaluation skips ahead until after a <code>THEN</code> builtin.</p> <p>For example:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>./test.sh<span class="w"> </span>sm.sql<span class="w"> </span><span class="s2">&quot;SELECT sm_run(&#39;1 1 1 = IF 2 THEN EXIT&#39;)&quot;</span> </pre></div> <p>Produces <code>2</code>. But</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>./test.sh<span class="w"> </span>sm.sql<span class="w"> </span><span class="s2">&quot;SELECT sm_run(&#39;1 1 0 = IF 2 THEN EXIT&#39;)&quot;</span> </pre></div> <p>Produces <code>1</code>.</p> <h3 id="implementation">Implementation</h3><p>Joining the <code>EXIT</code> condition in the interpeter loop we get:</p> <div class="highlight"><pre><span></span><span class="p">...</span> <span class="w"> </span><span class="n">WHILE</span><span class="w"> </span><span class="k">true</span><span class="w"> </span><span class="n">LOOP</span> <span class="w"> </span><span class="p">...</span> <span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;IF&#39;</span><span class="w"> </span><span class="k">THEN</span> <span class="w"> </span><span class="c1">-- Grab last item from stack</span> <span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)];</span> <span class="w"> </span><span class="c1">-- Remove one item from stack</span> <span class="w"> </span><span class="n">stack</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="mi">1</span><span class="p">:</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">];</span> <span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]::</span><span class="nb">boolean</span><span class="w"> </span><span class="k">THEN</span> <span class="w"> </span><span class="n">WHILE</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">pc</span><span class="p">]</span><span class="w"> </span><span class="o">&lt;&gt;</span><span class="w"> </span><span class="s1">&#39;THEN&#39;</span><span class="w"> </span><span class="n">LOOP</span> <span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="n">LOOP</span><span class="p">;</span> <span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="c1">-- Skip past THEN</span> <span class="w"> </span><span class="k">ELSE</span> <span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span> <span class="w"> </span><span class="k">CONTINUE</span><span class="p">;</span> <span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span> <span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;THEN&#39;</span><span class="w"> </span><span class="k">THEN</span> <span class="w"> </span><span class="c1">-- Just skip past it</span> <span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="k">CONTINUE</span><span class="p">;</span> <span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span> <span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;EXIT&#39;</span><span class="w"> </span><span class="k">THEN</span> <span class="w"> </span><span class="k">RETURN</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)];</span> <span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span> <span class="w"> </span><span class="p">...</span> </pre></div> <h3 id="other-builtins">Other builtins</h3><p>The <code>DUP</code> builtin makes a copy of the top of the stack. The <code>SWAP</code> builtin swaps the order of the top two items on the stack. And the <code>1-</code> builtin subtracts 1 from the top of the stack.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="p">...</span> <span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;DUP&#39;</span><span class="w"> </span><span class="k">THEN</span> <span class="w"> </span><span class="c1">-- Grab item</span> <span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)];</span> <span class="w"> </span><span class="c1">-- Add it to the stack</span> <span class="w"> </span><span class="n">stack</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">array_append</span><span class="p">(</span><span class="n">stack</span><span class="p">,</span><span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]);</span> <span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="k">CONTINUE</span><span class="p">;</span> <span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span> <span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;1-&#39;</span><span class="w"> </span><span class="k">THEN</span> <span class="w"> </span><span class="c1">-- Grab item</span> <span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)];</span> <span class="w"> </span><span class="c1">-- Rewrite top of stack</span> <span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]::</span><span class="nb">int</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="k">CONTINUE</span><span class="p">;</span> <span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span> <span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;SWAP&#39;</span><span class="w"> </span><span class="k">THEN</span> <span class="w"> </span><span class="c1">-- Grab two items from stack</span> <span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">];</span> <span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)];</span> <span class="w"> </span><span class="c1">-- Swap the two</span> <span class="w"> </span><span class="c1">-- Replace last item on stack</span> <span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">];</span> <span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">2</span><span class="p">];</span> <span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="k">CONTINUE</span><span class="p">;</span> <span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span> <span class="w"> </span><span class="p">...</span> </pre></div> <p>It's important that every builtin handle incrementing the program counter and skipping to the beginning of the loop. Because some builtins increment the program counter under different conditions (like <code>IF</code> above).</p> <p>The last few builtins are the simplest: arithmetic operations that produce integers or booleans.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="p">...</span> <span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;=&#39;</span><span class="w"> </span><span class="k">THEN</span> <span class="w"> </span><span class="c1">-- Grab two items from stack</span> <span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">];</span> <span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)];</span> <span class="w"> </span><span class="c1">-- Remove one item from stack</span> <span class="w"> </span><span class="n">stack</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="mi">1</span><span class="p">:</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">];</span> <span class="w"> </span><span class="c1">-- Replace last item on stack</span> <span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]::</span><span class="nb">int</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">2</span><span class="p">]::</span><span class="nb">int</span><span class="p">;</span> <span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="k">CONTINUE</span><span class="p">;</span> <span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span> <span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;&gt;&#39;</span><span class="w"> </span><span class="k">THEN</span> <span class="w"> </span><span class="c1">-- Grab two items from stack</span> <span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">];</span> <span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)];</span> <span class="w"> </span><span class="c1">-- Remove one item from stack</span> <span class="w"> </span><span class="n">stack</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="mi">1</span><span class="p">:</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">];</span> <span class="w"> </span><span class="c1">-- Replace last item on stack</span> <span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]::</span><span class="nb">int</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">2</span><span class="p">]::</span><span class="nb">int</span><span class="p">;</span> <span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="k">CONTINUE</span><span class="p">;</span> <span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span> <span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;+&#39;</span><span class="w"> </span><span class="k">THEN</span> <span class="w"> </span><span class="c1">-- Grab two items from stack</span> <span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">];</span> <span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)];</span> <span class="w"> </span><span class="c1">-- Remove one item from stack</span> <span class="w"> </span><span class="n">stack</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="mi">1</span><span class="p">:</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">];</span> <span class="w"> </span><span class="c1">-- Replace last item on stack</span> <span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]::</span><span class="nb">int</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">2</span><span class="p">]::</span><span class="nb">int</span><span class="p">;</span> <span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="k">CONTINUE</span><span class="p">;</span> <span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span> <span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;-&#39;</span><span class="w"> </span><span class="k">THEN</span> <span class="w"> </span><span class="c1">-- Grab two items from stack</span> <span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">];</span> <span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)];</span> <span class="w"> </span><span class="c1">-- Remove one item from stack</span> <span class="w"> </span><span class="n">stack</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="mi">1</span><span class="p">:</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">];</span> <span class="w"> </span><span class="c1">-- Replace last item on stack</span> <span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]::</span><span class="nb">int</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">2</span><span class="p">]::</span><span class="nb">int</span><span class="p">;</span> <span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="k">CONTINUE</span><span class="p">;</span> <span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span> <span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;*&#39;</span><span class="w"> </span><span class="k">THEN</span> <span class="w"> </span><span class="c1">-- Grab two items from stack</span> <span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">];</span> <span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)];</span> <span class="w"> </span><span class="c1">-- Remove one item from stack</span> <span class="w"> </span><span class="n">stack</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="mi">1</span><span class="p">:</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">];</span> <span class="w"> </span><span class="c1">-- Replace last item on stack</span> <span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]::</span><span class="nb">int</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">2</span><span class="p">]::</span><span class="nb">int</span><span class="p">;</span> <span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="k">CONTINUE</span><span class="p">;</span> <span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span> <span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;/&#39;</span><span class="w"> </span><span class="k">THEN</span> <span class="w"> </span><span class="c1">-- Grab two items from stack</span> <span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">];</span> <span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)];</span> <span class="w"> </span><span class="c1">-- Remove one item from stack</span> <span class="w"> </span><span class="n">stack</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="mi">1</span><span class="p">:</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">];</span> <span class="w"> </span><span class="c1">-- Replace last item on stack</span> <span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]::</span><span class="nb">int</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">2</span><span class="p">]::</span><span class="nb">int</span><span class="p">;</span> <span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="k">CONTINUE</span><span class="p">;</span> <span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span> <span class="w"> </span><span class="p">...</span> </pre></div> <h3 id="function-definitions">Function definitions</h3><p>Functions here will differ from Forth, borrowing elements of machine code. Return pointers will be stored in a dedicated return pointer stack. We could store it on the data stack but that would require more effort on the part of the programmer to restore the stack. Calling <code>RET</code> inside a function pops a return pointer off the return pointer stack.</p> <p>Here's a simple function definition: <code>DEF plus + RET</code>.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="p">...</span> <span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;DEF&#39;</span><span class="w"> </span><span class="k">THEN</span> <span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">pc</span><span class="o">+</span><span class="mi">1</span><span class="p">];</span><span class="w"> </span><span class="c1">-- function name</span> <span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">2</span><span class="p">;</span><span class="w"> </span><span class="c1">-- starting pc</span> <span class="w"> </span><span class="n">WHILE</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">pc</span><span class="p">]</span><span class="w"> </span><span class="o">&lt;&gt;</span><span class="w"> </span><span class="s1">&#39;RET&#39;</span><span class="w"> </span><span class="n">LOOP</span> <span class="w"> </span><span class="c1">-- RAISE NOTICE &#39;[Debug] skipping past: %.&#39;, tokens[pc];</span> <span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="n">LOOP</span><span class="p">;</span> <span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">defs</span><span class="w"> </span><span class="k">IS</span><span class="w"> </span><span class="k">NULL</span><span class="w"> </span><span class="k">THEN</span> <span class="w"> </span><span class="n">defs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">hstore</span><span class="p">(</span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">2</span><span class="p">]);</span> <span class="w"> </span><span class="k">ELSE</span> <span class="w"> </span><span class="n">defs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">defs</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">hstore</span><span class="p">(</span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">2</span><span class="p">]);</span> <span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span> <span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="c1">-- continue past &#39;RET&#39;</span> <span class="w"> </span><span class="k">CONTINUE</span><span class="p">;</span> <span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span> <span class="w"> </span><span class="p">...</span> </pre></div> <p>There doesn't seem to be a way to combine a NULL hstore value and a non-NULL hstore value. So that's why we need that special case.</p> <h3 id="return">Return</h3><p>The <code>RET</code> builtin pops a value off the return pointer stack and jumps to it.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="p">...</span> <span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;RET&#39;</span><span class="w"> </span><span class="k">THEN</span> <span class="w"> </span><span class="c1">-- Grab last return pointer</span> <span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rps</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">rps</span><span class="p">)];</span> <span class="w"> </span><span class="c1">-- Drop last return pointer from stack</span> <span class="w"> </span><span class="n">rps</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">rps</span><span class="p">[</span><span class="mi">1</span><span class="p">:</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">rps</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">];</span> <span class="w"> </span><span class="c1">-- Jump to last return pointer</span> <span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]::</span><span class="nb">int</span><span class="p">;</span> <span class="w"> </span><span class="k">CONTINUE</span><span class="p">;</span> <span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span> <span class="w"> </span><span class="p">...</span> </pre></div> <h3 id="function-calls">Function calls</h3><p>Forming the other half of function calls is the <code>CALL</code> builtin. This places the program counter (plus one, past the <code>CALL</code> token) onto the return pointer stack and jumps to the position of the function if it exists.</p> <p>A simple function call for the above <code>plus</code> function might be: <code>2 3 plus CALL</code> and would produce <code>5</code> on the top of the stack.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="p">...</span> <span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;CALL&#39;</span><span class="w"> </span><span class="k">THEN</span> <span class="w"> </span><span class="c1">-- Grab item</span> <span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)];</span> <span class="w"> </span><span class="c1">-- Remove one item from stack</span> <span class="w"> </span><span class="n">stack</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">stack</span><span class="p">[</span><span class="mi">1</span><span class="p">:</span><span class="n">sm_alength</span><span class="p">(</span><span class="n">stack</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">];</span> <span class="w"> </span><span class="c1">-- Store return pointer</span> <span class="w"> </span><span class="n">rps</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">array_append</span><span class="p">(</span><span class="n">rps</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="n">pc</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">)::</span><span class="nb">text</span><span class="p">);</span> <span class="w"> </span><span class="c1">-- Fail if function not defined</span> <span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="n">defs</span><span class="o">?</span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="k">THEN</span> <span class="w"> </span><span class="n">RAISE</span><span class="w"> </span><span class="k">EXCEPTION</span><span class="w"> </span><span class="s1">&#39;No such function, %.&#39;</span><span class="p">,</span><span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">];</span> <span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span> <span class="w"> </span><span class="c1">-- Otherwise jump to function</span> <span class="w"> </span><span class="n">RAISE</span><span class="w"> </span><span class="n">NOTICE</span><span class="w"> </span><span class="s1">&#39;[Debug] Jumping to: %:%.&#39;</span><span class="p">,</span><span class="w"> </span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="w"> </span><span class="n">defs</span><span class="o">-&gt;</span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">];</span> <span class="w"> </span><span class="n">pc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">defs</span><span class="o">-&gt;</span><span class="n">tmps</span><span class="p">[</span><span class="mi">1</span><span class="p">];</span> <span class="w"> </span><span class="k">CONTINUE</span><span class="p">;</span> <span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span> <span class="w"> </span><span class="p">...</span> </pre></div> <p>And that's it! All done the basic instructions needed. Store all that code in <code>sm.sql</code> and grab the <code>test.sh</code> code from the previous post:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>./test.sh sudo<span class="w"> </span>-u<span class="w"> </span>postgres<span class="w"> </span>psql<span class="w"> </span>-c<span class="w"> </span><span class="s2">&quot;</span><span class="k">$(</span><span class="nb">printf</span><span class="w"> </span><span class="s2">&quot;%s;\n%s&quot;</span><span class="w"> </span><span class="s2">&quot;</span><span class="k">$(</span>cat<span class="w"> </span><span class="nv">$1</span><span class="k">)</span><span class="s2">&quot;</span><span class="w"> </span><span class="s2">&quot;</span><span class="nv">$2</span><span class="s2">&quot;</span><span class="k">)</span><span class="s2">&quot;</span> </pre></div> <p>And try out our port of recursive fibonacci:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>./test.sh<span class="w"> </span>sm.sql<span class="w"> </span><span class="s2">&quot;SELECT sm_run(&#39;</span> <span class="s2">DEF fib</span> <span class="s2"> DUP 1 &gt; IF</span> <span class="s2"> 1- DUP 1- fib CALL SWAP fib CALL + THEN</span> <span class="s2"> RET</span> <span class="s2">20 fib CALL</span> <span class="s2">EXIT&#39;)&quot;</span> ... <span class="w"> </span>sm_run -------- <span class="w"> </span><span class="m">6765</span> <span class="o">(</span><span class="m">1</span><span class="w"> </span>row<span class="o">)</span> </pre></div> <p>Happy PL/pgSQL- and Forth-ish-ing!</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Latest post is up! Writing a Forth(-inspired language) implementation from scratch in PL/pgSQL. Because who doesn&#39;t want to be able to run stack machine code from SELECT statements in PostgreSQL?<a href="https://t.co/sbxhuDp1J9">https://t.co/sbxhuDp1J9</a> <a href="https://t.co/9nrHEIhRPa">pic.twitter.com/9nrHEIhRPa</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1453958284109500417?ref_src=twsrc%5Etfw">October 29, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/exploring-plpgsql-forth-like.htmlFri, 29 Oct 2021 00:00:00 +0000Exploring PL/pgSQL: Strings, arrays, recursion, and parsing JSONhttp://notes.eatonphil.com/exploring-plpgsql.html<p class="note"> Next in exploring PL/pgSQL: <br /> <a href="exploring-plpgsql-forth-like.html">Implementing a Forth-like interpreter</a> </p><p>PostgreSQL comes with a builtin imperative programming language called PL/pgSQL. I used to think this language was scary because it has a bit more adornment than your usual language does. But looking deeper, it's actually reasonably pleasant to program in.</p> <p>In this post we'll get familiar with it by working with strings, arrays and recursive functions. We'll top it all off by building a parser for a subset of JSON (no nested objects, no arrays, no unicode, no decimals).</p> <p>The goal here is not production-quality code (an amazing JSON library is already built into PostgreSQL) but simply to get more familiar with the PL/pgSQL language.</p> <p>All code for this post is available on <a href="https://github.com/eatonphil/exploring-plpgsql">Github</a>.</p> <h3 id="creating-functions">Creating functions</h3><p>Functions are declared like tables. Here's a very simple one that returns the length of a string:</p> <div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="k">REPLACE</span><span class="w"> </span><span class="k">FUNCTION</span><span class="w"> </span><span class="n">slength</span><span class="p">(</span><span class="n">s</span><span class="w"> </span><span class="nb">text</span><span class="p">)</span><span class="w"> </span><span class="k">RETURNS</span><span class="w"> </span><span class="nb">int</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="err">$$</span> <span class="k">BEGIN</span> <span class="w"> </span><span class="k">RETURN</span><span class="w"> </span><span class="k">length</span><span class="p">(</span><span class="n">s</span><span class="p">);</span> <span class="k">END</span><span class="p">;</span> <span class="err">$$</span><span class="w"> </span><span class="k">LANGUAGE</span><span class="w"> </span><span class="n">plpgsql</span><span class="p">;</span> </pre></div> <p>It's not a very useful function because <code>length</code> already exists but the point is to see a basic custom function.</p> <p>All statements in PL/pgSQL must end in a semicolon. Arguments do not have to be named. If they are not named they get default names of <code>$1</code> to <code>$N</code>.</p> <h4 id="named/unnamed-arguments">Named/unnamed arguments</h4><p>Here's how the function could be written without named arguments:</p> <div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="k">REPLACE</span><span class="w"> </span><span class="k">FUNCTION</span><span class="w"> </span><span class="n">slength</span><span class="p">(</span><span class="nb">text</span><span class="p">)</span><span class="w"> </span><span class="k">RETURNS</span><span class="w"> </span><span class="nb">int</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="err">$$</span> <span class="k">BEGIN</span> <span class="w"> </span><span class="k">RETURN</span><span class="w"> </span><span class="k">length</span><span class="p">(</span><span class="err">$</span><span class="mi">1</span><span class="p">);</span> <span class="k">END</span><span class="p">;</span> <span class="err">$$</span><span class="w"> </span><span class="k">LANGUAGE</span><span class="w"> </span><span class="n">plpgsql</span><span class="p">;</span> </pre></div> <h4 id="out-declarations">Out declarations</h4><p>PL/pgSQL also allows you to declare which variables will be returned in the function argument list. They call it OUT parameters but as far as I can tell it is not like OUT parameters in C# where you are modifying the value of a variable in an external scope.</p> <div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="k">REPLACE</span><span class="w"> </span><span class="k">FUNCTION</span><span class="w"> </span><span class="n">slength</span><span class="p">(</span><span class="n">s</span><span class="w"> </span><span class="nb">text</span><span class="p">,</span><span class="w"> </span><span class="k">OUT</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="nb">int</span><span class="p">)</span><span class="w"> </span><span class="k">RETURNS</span><span class="w"> </span><span class="nb">int</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="err">$$</span> <span class="k">BEGIN</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">length</span><span class="p">(</span><span class="n">s</span><span class="p">);</span> <span class="k">END</span><span class="p">;</span> <span class="err">$$</span><span class="w"> </span><span class="k">LANGUAGE</span><span class="w"> </span><span class="n">plpgsql</span><span class="p">;</span> </pre></div> <p>This is still equivalent to the first function and is basically a shortcut for:</p> <div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="k">REPLACE</span><span class="w"> </span><span class="k">FUNCTION</span><span class="w"> </span><span class="n">slength</span><span class="p">(</span><span class="n">s</span><span class="w"> </span><span class="nb">text</span><span class="p">)</span><span class="w"> </span><span class="k">RETURNS</span><span class="w"> </span><span class="nb">int</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="err">$$</span> <span class="k">DECLARE</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="nb">int</span><span class="p">;</span> <span class="k">BEGIN</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">length</span><span class="p">(</span><span class="n">s</span><span class="p">);</span> <span class="w"> </span><span class="k">RETURN</span><span class="w"> </span><span class="n">i</span><span class="p">;</span> <span class="k">END</span><span class="p">;</span> <span class="err">$$</span><span class="w"> </span><span class="k">LANGUAGE</span><span class="w"> </span><span class="n">plpgsql</span><span class="p">;</span> </pre></div> <p>Whether you declare OUT or not you still must include <code>RETURNS &lt;type&gt;</code> in the function signature otherwise even if you call <code>RETURN</code> in the body, the result will just be ignored.</p> <p>Don't worry about case sensitivity too much. It's really only important, as in typical SQL, for mixed-case table and column names. But we won't be dealing with that situation in this article focused on programming PL/pgSQL.</p> <h4 id="testing-it-out">Testing it out</h4><p>Once the function is created, you can call it like <code>SELECT slength('foo');</code>. So here's a helper script to load a SQL file and run a command:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>./test.sh sudo<span class="w"> </span>-u<span class="w"> </span>postgres<span class="w"> </span>psql<span class="w"> </span>-c<span class="w"> </span><span class="s2">&quot;</span><span class="k">$(</span><span class="nb">printf</span><span class="w"> </span><span class="s2">&quot;%s;\n%s&quot;</span><span class="w"> </span><span class="s2">&quot;</span><span class="k">$(</span>cat<span class="w"> </span><span class="nv">$1</span><span class="k">)</span><span class="s2">&quot;</span><span class="w"> </span><span class="s2">&quot;</span><span class="nv">$2</span><span class="s2">&quot;</span><span class="k">)</span><span class="s2">&quot;</span> $<span class="w"> </span>chmod<span class="w"> </span>+x<span class="w"> </span>./test.sh </pre></div> <p>After storing the above <code>slength</code> code in <code>slength.sql</code> we can run a test:</p> <div class="highlight"><pre><span></span><span class="err">$</span><span class="w"> </span><span class="p">.</span><span class="o">/</span><span class="n">test</span><span class="p">.</span><span class="n">sh</span><span class="w"> </span><span class="p">.</span><span class="o">/</span><span class="n">slength</span><span class="p">.</span><span class="k">sql</span><span class="w"> </span><span class="ss">&quot;SELECT slength(&#39;foo&#39;)&quot;</span> <span class="w"> </span><span class="n">slength</span> <span class="c1">---------</span> <span class="w"> </span><span class="mi">3</span> <span class="p">(</span><span class="mi">1</span><span class="w"> </span><span class="k">row</span><span class="p">)</span> </pre></div> <p>Easy!</p> <h3 id="numbers-and-recursion">Numbers and recursion</h3><p>Ok now that we've got the basics of function definition down and a way to test the code, let's write a fibonacci program.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>./fib.sql CREATE<span class="w"> </span>OR<span class="w"> </span>REPLACE<span class="w"> </span>FUNCTION<span class="w"> </span>fib<span class="o">(</span>i<span class="w"> </span>int<span class="o">)</span><span class="w"> </span>RETURNS<span class="w"> </span>int<span class="w"> </span>AS<span class="w"> </span><span class="nv">$$</span> BEGIN <span class="w"> </span>IF<span class="w"> </span><span class="nv">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0</span><span class="w"> </span>OR<span class="w"> </span><span class="nv">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="w"> </span>THEN <span class="w"> </span>RETURN<span class="w"> </span>i<span class="p">;</span> <span class="w"> </span>END<span class="w"> </span>IF<span class="p">;</span> <span class="w"> </span>RETURN<span class="w"> </span>fib<span class="o">(</span>i<span class="w"> </span>-<span class="w"> </span><span class="m">1</span><span class="o">)</span><span class="w"> </span>+<span class="w"> </span>fib<span class="o">(</span>i<span class="w"> </span>-<span class="w"> </span><span class="m">2</span><span class="o">)</span><span class="p">;</span> END<span class="p">;</span> <span class="nv">$$</span><span class="w"> </span>LANGUAGE<span class="w"> </span>plpgsql<span class="p">;</span> </pre></div> <p>Everything in the if test is normal SQL WHERE clause syntax. This makes it very easy for folks familiar with SQL to pick up conditionals in PL/pgSQL.</p> <p>And there's no special syntax to allow function recursion. Nice!</p> <p>Run and test this function:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>./test.sh<span class="w"> </span>./fib.sql<span class="w"> </span><span class="s2">&quot;SELECT fib(10)&quot;</span> <span class="w"> </span>fib ----- <span class="w"> </span><span class="m">55</span> <span class="o">(</span><span class="m">1</span><span class="w"> </span>row<span class="o">)</span> </pre></div> <p>Getting the hang of it?</p> <h3 id="strings-and-arrays">Strings and arrays</h3><p>You may have noticed that <code>length</code> used in <code>slength</code> is a builtin PostgreSQL function for dealing with strings. All builtin functions in PostgreSQL can be used in PL/pgSQL.</p> <p>In order to get familiar with using arrays in PL/pgSQL let's write a <code>string_to_array</code> function.</p> <div class="highlight"><pre><span></span><span class="err">$</span><span class="w"> </span><span class="n">cat</span><span class="w"> </span><span class="p">.</span><span class="o">/</span><span class="n">string_to_array</span><span class="p">.</span><span class="k">sql</span> <span class="k">CREATE</span><span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="k">REPLACE</span><span class="w"> </span><span class="k">FUNCTION</span><span class="w"> </span><span class="n">string_to_array</span><span class="p">(</span><span class="n">s</span><span class="w"> </span><span class="nb">text</span><span class="p">)</span><span class="w"> </span><span class="k">RETURNS</span><span class="w"> </span><span class="nb">char</span><span class="p">[]</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="err">$$</span> <span class="k">DECLARE</span> <span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="nb">char</span><span class="p">[];</span> <span class="k">BEGIN</span> <span class="w"> </span><span class="n">WHILE</span><span class="w"> </span><span class="k">COALESCE</span><span class="p">(</span><span class="n">array_length</span><span class="p">(</span><span class="n">a</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">),</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="k">length</span><span class="p">(</span><span class="n">s</span><span class="p">)</span><span class="w"> </span><span class="n">LOOP</span> <span class="w"> </span><span class="n">a</span><span class="p">[</span><span class="k">COALESCE</span><span class="p">(</span><span class="n">array_length</span><span class="p">(</span><span class="n">a</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">),</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">substr</span><span class="p">(</span><span class="n">s</span><span class="p">,</span><span class="w"> </span><span class="k">COALESCE</span><span class="p">(</span><span class="n">array_length</span><span class="p">(</span><span class="n">a</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">),</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span> <span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="n">LOOP</span><span class="p">;</span> <span class="w"> </span><span class="k">RETURN</span><span class="w"> </span><span class="n">a</span><span class="p">;</span> <span class="k">END</span><span class="p">;</span> <span class="err">$$</span><span class="w"> </span><span class="k">LANGUAGE</span><span class="w"> </span><span class="n">plpgsql</span><span class="p">;</span> </pre></div> <p>This is one way to do it by modify array values directly by index. We need to coalesce because calling <code>array_length</code> on an empty array returns <code>NULL</code>.</p> <p>Another way to do this is by calling the builtin function <code>array_append</code>.</p> <div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="k">REPLACE</span><span class="w"> </span><span class="k">FUNCTION</span><span class="w"> </span><span class="n">string_to_array</span><span class="p">(</span><span class="n">s</span><span class="w"> </span><span class="nb">text</span><span class="p">)</span><span class="w"> </span><span class="k">RETURNS</span><span class="w"> </span><span class="nb">char</span><span class="p">[]</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="err">$$</span> <span class="k">DECLARE</span> <span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="nb">char</span><span class="p">[];</span> <span class="k">BEGIN</span> <span class="w"> </span><span class="n">WHILE</span><span class="w"> </span><span class="k">COALESCE</span><span class="p">(</span><span class="n">array_length</span><span class="p">(</span><span class="n">a</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">),</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="k">length</span><span class="p">(</span><span class="n">s</span><span class="p">)</span><span class="w"> </span><span class="n">LOOP</span> <span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">array_append</span><span class="p">(</span><span class="n">a</span><span class="p">,</span><span class="w"> </span><span class="n">substr</span><span class="p">(</span><span class="n">s</span><span class="p">,</span><span class="w"> </span><span class="k">COALESCE</span><span class="p">(</span><span class="n">array_length</span><span class="p">(</span><span class="n">a</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">),</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">)::</span><span class="nb">char</span><span class="p">);</span> <span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="n">LOOP</span><span class="p">;</span> <span class="w"> </span><span class="k">RETURN</span><span class="w"> </span><span class="n">a</span><span class="p">;</span> <span class="k">END</span><span class="p">;</span> <span class="err">$$</span><span class="w"> </span><span class="k">LANGUAGE</span><span class="w"> </span><span class="n">plpgsql</span><span class="p">;</span> </pre></div> <p>We can test and run both:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>./test.sh<span class="w"> </span>./string_to_array.sql<span class="w"> </span><span class="s2">&quot;SELECT string_to_array(&#39;foo&#39;)&quot;</span> <span class="w"> </span>string_to_array ----------------- <span class="w"> </span><span class="o">{</span>f,o,o<span class="o">}</span> <span class="o">(</span><span class="m">1</span><span class="w"> </span>row<span class="o">)</span> $<span class="w"> </span>./test.sh<span class="w"> </span>./string_to_array2.sql<span class="w"> </span><span class="s2">&quot;SELECT string_to_array(&#39;foo&#39;)&quot;</span> <span class="w"> </span>string_to_array ----------------- <span class="w"> </span><span class="o">{</span>f,o,o<span class="o">}</span> <span class="o">(</span><span class="m">1</span><span class="w"> </span>row<span class="o">)</span> </pre></div> <p>Of course the builtin alternative might be <code>SELECT regexp_split_to_array('foo')</code> but we need the practice.</p> <h3 id="custom-compound-types">Custom compound types</h3><p>If we're going to lex and parse JSON, we're going to want to return an array of tokens from the lexer. A token will need to contain the type (e.g. number, string, syntax) and the string value of the token (e.g. <code>1</code>, <code>{</code>, <code>my great key</code>).</p> <p>PostgreSQL allows us to create compound types that we can then use as the base of an array:</p> <div class="highlight"><pre><span></span><span class="nv">DROP</span><span class="w"> </span><span class="nv">TYPE</span><span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="nv">EXISTS</span><span class="w"> </span><span class="nv">json_token</span><span class="w"> </span><span class="nv">CASCADE</span><span class="c1">;</span> <span class="nv">CREATE</span><span class="w"> </span><span class="nv">TYPE</span><span class="w"> </span><span class="nv">json_token</span><span class="w"> </span><span class="nv">AS</span><span class="w"> </span><span class="ss">(</span> <span class="w"> </span><span class="nv">kind</span><span class="w"> </span><span class="nv">text</span>, <span class="w"> </span><span class="nv">value</span><span class="w"> </span><span class="nv">text</span> <span class="ss">)</span><span class="c1">;</span> </pre></div> <p>We need to add <code>CASCADE</code> here because functions will have this type in their signature and it otherwise makes PostgreSQL unhappy to delete the type used in a function before deleting the function.</p> <p>We can create literals of this type like <code>SELECT ('number', '12')::json_token)</code>.</p> <p>Now we're ready to build out the lexer.</p> <h3 id="lexing">Lexing</h3><p>The lexers job is to clump together groups of characters into tokens.</p> <p>I'm going to describe this function in literate code.</p> <div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="k">REPLACE</span><span class="w"> </span><span class="k">FUNCTION</span><span class="w"> </span><span class="n">json_lex</span><span class="p">(</span><span class="n">j</span><span class="w"> </span><span class="nb">text</span><span class="p">,</span><span class="w"> </span><span class="k">OUT</span><span class="w"> </span><span class="n">ts</span><span class="w"> </span><span class="n">json_token</span><span class="p">[])</span><span class="w"> </span><span class="k">RETURNS</span><span class="w"> </span><span class="n">json_token</span><span class="p">[]</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="err">$$</span> </pre></div> <p>This function takes a string in and returns an array of json tokens.</p> <div class="highlight"><pre><span></span><span class="k">DECLARE</span><span class="w"> </span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="nb">int</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span><span class="w"> </span><span class="c1">-- Index in loop</span> <span class="w"> </span><span class="k">c</span><span class="w"> </span><span class="nb">text</span><span class="p">;</span><span class="w"> </span><span class="c1">-- Current character in loop</span> <span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="nb">text</span><span class="p">;</span><span class="w"> </span><span class="c1">-- Current accumulated characters</span> </pre></div> <p>We need to declare all variables up front.</p> <div class="highlight"><pre><span></span><span class="k">BEGIN</span> <span class="w"> </span><span class="n">WHILE</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="k">length</span><span class="p">(</span><span class="n">j</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="n">LOOP</span> <span class="w"> </span><span class="k">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">substr</span><span class="p">(</span><span class="n">j</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span> <span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;&#39;</span><span class="p">;</span> </pre></div> <p>The main loop just looks at all characters.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="c1">-- Handle syntax characters</span> <span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="k">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;{&#39;</span><span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="k">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;}&#39;</span><span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="k">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;,&#39;</span><span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="k">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;:&#39;</span><span class="w"> </span><span class="k">THEN</span> <span class="w"> </span><span class="n">ts</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">array_append</span><span class="p">(</span><span class="n">ts</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="s1">&#39;syntax&#39;</span><span class="p">,</span><span class="w"> </span><span class="k">c</span><span class="p">)::</span><span class="n">json_token</span><span class="p">);</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="k">CONTINUE</span><span class="p">;</span> <span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span> </pre></div> <p>First we look if the character is a syntax character. If it is we append it to the array of tokens, increment the index, and go back to the start of the main loop.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="c1">-- Handle whitespace</span> <span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">regexp_replace</span><span class="p">(</span><span class="k">c</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;^\s+&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;&#39;</span><span class="p">)</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;&#39;</span><span class="w"> </span><span class="k">THEN</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="k">CONTINUE</span><span class="p">;</span> <span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span> </pre></div> <p>Then we check for whitespace characters. If replacing all whitespace characters returns an empty string then we know it's whitespace. We could also have done something like <code>IF c = ' ' OR c = '\n' ... THEN</code> instead.</p> <p>Same as before though if we find whitespace characters we move on (don't accumulate them) and restart the main loop.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="c1">-- Handle strings</span> <span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="k">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;&quot;&#39;</span><span class="w"> </span><span class="k">THEN</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="k">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">substr</span><span class="p">(</span><span class="n">j</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span> <span class="w"> </span><span class="n">WHILE</span><span class="w"> </span><span class="k">c</span><span class="w"> </span><span class="o">&lt;&gt;</span><span class="w"> </span><span class="s1">&#39;&quot;&#39;</span><span class="w"> </span><span class="n">LOOP</span> <span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="k">c</span><span class="p">;</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="k">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">substr</span><span class="p">(</span><span class="n">j</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span> <span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="n">LOOP</span><span class="p">;</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="n">ts</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">array_append</span><span class="p">(</span><span class="n">ts</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="s1">&#39;string&#39;</span><span class="p">,</span><span class="w"> </span><span class="n">token</span><span class="p">)::</span><span class="n">json_token</span><span class="p">);</span> <span class="w"> </span><span class="k">CONTINUE</span><span class="p">;</span> <span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span> </pre></div> <p>Next we loop through any strings we find and accumulate them as tokens before restarting the main loop.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="c1">-- Handle numbers</span> <span class="w"> </span><span class="n">WHILE</span><span class="w"> </span><span class="k">c</span><span class="w"> </span><span class="o">~</span><span class="w"> </span><span class="s1">&#39;^[0-9]+$&#39;</span><span class="w"> </span><span class="n">LOOP</span> <span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="k">c</span><span class="p">;</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="k">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">substr</span><span class="p">(</span><span class="n">j</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span> <span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="n">LOOP</span><span class="p">;</span> <span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="k">length</span><span class="p">(</span><span class="n">token</span><span class="p">)</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="k">THEN</span> <span class="w"> </span><span class="n">ts</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">array_append</span><span class="p">(</span><span class="n">ts</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="s1">&#39;number&#39;</span><span class="p">,</span><span class="w"> </span><span class="n">token</span><span class="p">)::</span><span class="n">json_token</span><span class="p">);</span> <span class="w"> </span><span class="k">CONTINUE</span><span class="p">;</span> <span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span> </pre></div> <p>Then we look for integers.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="n">RAISE</span><span class="w"> </span><span class="k">EXCEPTION</span><span class="w"> </span><span class="s1">&#39;Unknown character: %, at index: %; already found: %.&#39;</span><span class="p">,</span><span class="w"> </span><span class="k">c</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">,</span><span class="w"> </span><span class="n">ts</span><span class="p">;</span> <span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="n">LOOP</span><span class="p">;</span> <span class="k">END</span><span class="p">;</span> <span class="err">$$</span><span class="w"> </span><span class="k">LANGUAGE</span><span class="w"> </span><span class="n">plpgsql</span><span class="p">;</span> </pre></div> <p>Lastly if none of those lexing handlers match, we give up! Then the loop is done and the function is too.</p> <p>There's no <code>RETURN</code> statement because we already declared an <code>OUT</code> variable.</p> <p>If we test and run this now:</p> <div class="highlight"><pre><span></span>./test.sh<span class="w"> </span>./json.sql<span class="w"> </span><span class="s2">&quot;SELECT json_lex(&#39;{\&quot;flubberty\&quot;: 12, \&quot;nice\&quot;: \&quot;a\&quot;}&#39;)&quot;</span> <span class="w"> </span>json_lex ---------------------------------------------------------------------------------------------------------------------------------------- <span class="w"> </span><span class="o">{</span><span class="s2">&quot;(syntax,{)&quot;</span>,<span class="s2">&quot;(string,flubberty)&quot;</span>,<span class="s2">&quot;(syntax,:)&quot;</span>,<span class="s2">&quot;(number,12)&quot;</span>,<span class="s2">&quot;(syntax,\&quot;,\&quot;)&quot;</span>,<span class="s2">&quot;(string,nice)&quot;</span>,<span class="s2">&quot;(syntax,:)&quot;</span>,<span class="s2">&quot;(string,a)&quot;</span>,<span class="s2">&quot;(syntax,})&quot;</span><span class="o">}</span> <span class="o">(</span><span class="m">1</span><span class="w"> </span>row<span class="o">)</span> </pre></div> <p>It's messy but it worked! Now on to parsing.</p> <h3 id="parsing">Parsing</h3><p>Our parser will only accept JSON objects. JSON objects will be defined as an array of key-value pairs. Custom types make this nice again.</p> <div class="highlight"><pre><span></span><span class="k">DROP</span><span class="w"> </span><span class="k">TYPE</span><span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="k">EXISTS</span><span class="w"> </span><span class="n">json_key_value</span><span class="w"> </span><span class="k">CASCADE</span><span class="p">;</span> <span class="k">CREATE</span><span class="w"> </span><span class="k">TYPE</span><span class="w"> </span><span class="n">json_key_value</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="n">k</span><span class="w"> </span><span class="nb">text</span><span class="p">,</span> <span class="w"> </span><span class="n">v</span><span class="w"> </span><span class="nb">text</span> <span class="p">);</span> </pre></div> <p>One thing PostgreSQL does not make nice is sum types or parametric types. But even if the value here is stored as text it can be easily cast to a number by the user. And again, we're not going to support nested objects/arrays. But using <code>hstore</code> for key-values might be the better alternative if we wanted to build a real JSON parser.</p> <div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="k">REPLACE</span><span class="w"> </span><span class="k">FUNCTION</span><span class="w"> </span><span class="n">json_parse</span><span class="p">(</span><span class="n">ts</span><span class="w"> </span><span class="n">json_token</span><span class="p">[],</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="nb">int</span><span class="p">)</span><span class="w"> </span><span class="k">RETURNS</span><span class="w"> </span><span class="n">json_key_value</span><span class="p">[]</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="err">$$</span> <span class="k">DECLARE</span> <span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="n">json_token</span><span class="p">;</span><span class="w"> </span><span class="c1">-- Current token in tokens loop</span> <span class="w"> </span><span class="n">kvs</span><span class="w"> </span><span class="n">json_key_value</span><span class="p">[];</span> <span class="w"> </span><span class="n">k</span><span class="w"> </span><span class="nb">text</span><span class="p">;</span> <span class="k">BEGIN</span> <span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ts</span><span class="p">[</span><span class="n">i</span><span class="p">];</span> <span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">t</span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">&lt;&gt;</span><span class="w"> </span><span class="s1">&#39;syntax&#39;</span><span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="n">t</span><span class="p">.</span><span class="n">value</span><span class="w"> </span><span class="o">&lt;&gt;</span><span class="w"> </span><span class="s1">&#39;{&#39;</span><span class="w"> </span><span class="k">THEN</span> <span class="w"> </span><span class="n">RAISE</span><span class="w"> </span><span class="k">EXCEPTION</span><span class="w"> </span><span class="s1">&#39;Invalid JSON, must be an object, got: %.&#39;</span><span class="p">,</span><span class="w"> </span><span class="n">t</span><span class="p">;</span> <span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ts</span><span class="p">[</span><span class="n">i</span><span class="p">];</span> </pre></div> <p>First up in the parser is variable declarations and validating that this list of tokens represents a JSON object.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="n">WHILE</span><span class="w"> </span><span class="n">t</span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">&lt;&gt;</span><span class="w"> </span><span class="s1">&#39;syntax&#39;</span><span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="n">t</span><span class="p">.</span><span class="n">value</span><span class="w"> </span><span class="o">&lt;&gt;</span><span class="w"> </span><span class="s1">&#39;}&#39;</span><span class="w"> </span><span class="n">LOOP</span> <span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">array_length</span><span class="p">(</span><span class="n">kvs</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="k">THEN</span> <span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">t</span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">&lt;&gt;</span><span class="w"> </span><span class="s1">&#39;syntax&#39;</span><span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="n">t</span><span class="p">.</span><span class="n">value</span><span class="w"> </span><span class="o">&lt;&gt;</span><span class="w"> </span><span class="s1">&#39;,&#39;</span><span class="w"> </span><span class="k">THEN</span> <span class="w"> </span><span class="n">RAISE</span><span class="w"> </span><span class="k">EXCEPTION</span><span class="w"> </span><span class="s1">&#39;JSON key-value pair must be followed by a comma or closing brace, got: %.&#39;</span><span class="p">,</span><span class="w"> </span><span class="n">t</span><span class="p">;</span> <span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ts</span><span class="p">[</span><span class="n">i</span><span class="p">];</span> <span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span> </pre></div> <p>Then we loop to find each key-value pair. If one has already been found, we need to find a comma before the next pair.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">t</span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">&lt;&gt;</span><span class="w"> </span><span class="s1">&#39;string&#39;</span><span class="w"> </span><span class="k">THEN</span> <span class="w"> </span><span class="n">RAISE</span><span class="w"> </span><span class="k">EXCEPTION</span><span class="w"> </span><span class="s1">&#39;JSON object must start with string key, got: %.&#39;</span><span class="p">,</span><span class="w"> </span><span class="n">t</span><span class="p">;</span> <span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span> <span class="w"> </span><span class="n">k</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">t</span><span class="p">.</span><span class="n">value</span><span class="p">;</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ts</span><span class="p">[</span><span class="n">i</span><span class="p">];</span> <span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">t</span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">&lt;&gt;</span><span class="w"> </span><span class="s1">&#39;syntax&#39;</span><span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="n">t</span><span class="p">.</span><span class="n">value</span><span class="w"> </span><span class="o">&lt;&gt;</span><span class="w"> </span><span class="s1">&#39;:&#39;</span><span class="w"> </span><span class="k">THEN</span> <span class="w"> </span><span class="n">RAISE</span><span class="w"> </span><span class="k">EXCEPTION</span><span class="w"> </span><span class="s1">&#39;JSON object must start with string key followed by colon, got: %.&#39;</span><span class="p">,</span><span class="w"> </span><span class="n">t</span><span class="p">;</span> <span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ts</span><span class="p">[</span><span class="n">i</span><span class="p">];</span> <span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">t</span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;number&#39;</span><span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="n">t</span><span class="p">.</span><span class="n">kind</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;string&#39;</span><span class="w"> </span><span class="k">THEN</span> <span class="w"> </span><span class="n">kvs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">array_append</span><span class="p">(</span><span class="n">kvs</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="n">k</span><span class="p">,</span><span class="w"> </span><span class="n">t</span><span class="p">)::</span><span class="n">json_key_value</span><span class="p">);</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ts</span><span class="p">[</span><span class="n">i</span><span class="p">];</span> <span class="w"> </span><span class="k">CONTINUE</span><span class="p">;</span> <span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span> <span class="w"> </span><span class="n">RAISE</span><span class="w"> </span><span class="k">EXCEPTION</span><span class="w"> </span><span class="s1">&#39;Invalid key-value pair syntax, got: %.&#39;</span><span class="p">,</span><span class="w"> </span><span class="n">t</span><span class="p">;</span> <span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="n">LOOP</span><span class="p">;</span> <span class="w"> </span><span class="k">RETURN</span><span class="w"> </span><span class="n">kvs</span><span class="p">;</span> <span class="k">END</span><span class="p">;</span> <span class="err">$$</span><span class="w"> </span><span class="k">LANGUAGE</span><span class="w"> </span><span class="n">plpgsql</span><span class="p">;</span> </pre></div> <p>Then we just look for the key, colon, value syntax and fail if we don't see it. And that's it! Very simple when not dealing with arrays and nested objects.</p> <h3 id="helpers">Helpers</h3><p>Lastly it would just be nice to have a single function that calls lex and parse:</p> <div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="k">REPLACE</span><span class="w"> </span><span class="k">FUNCTION</span><span class="w"> </span><span class="n">json_from_string</span><span class="p">(</span><span class="n">s</span><span class="w"> </span><span class="nb">text</span><span class="p">)</span><span class="w"> </span><span class="k">RETURNS</span><span class="w"> </span><span class="n">json_key_value</span><span class="p">[]</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="err">$$</span> <span class="k">BEGIN</span> <span class="w"> </span><span class="k">RETURN</span><span class="w"> </span><span class="n">json_parse</span><span class="p">(</span><span class="n">json_lex</span><span class="p">(</span><span class="n">s</span><span class="p">),</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span> <span class="k">END</span><span class="p">;</span> <span class="err">$$</span><span class="w"> </span><span class="k">LANGUAGE</span><span class="w"> </span><span class="n">plpgsql</span><span class="p">;</span> </pre></div> <p>And another function to look up a value in a parsed object by key:</p> <div class="highlight"><pre><span></span><span class="k">CREATE</span><span class="w"> </span><span class="k">OR</span><span class="w"> </span><span class="k">REPLACE</span><span class="w"> </span><span class="k">FUNCTION</span><span class="w"> </span><span class="n">json_get</span><span class="p">(</span><span class="n">kvs</span><span class="w"> </span><span class="n">json_key_value</span><span class="p">[],</span><span class="w"> </span><span class="k">key</span><span class="w"> </span><span class="nb">text</span><span class="p">)</span><span class="w"> </span><span class="k">RETURNS</span><span class="w"> </span><span class="nb">text</span><span class="w"> </span><span class="k">AS</span><span class="w"> </span><span class="err">$$</span> <span class="k">DECLARE</span> <span class="w"> </span><span class="n">kv</span><span class="w"> </span><span class="n">json_key_value</span><span class="p">;</span> <span class="k">BEGIN</span> <span class="w"> </span><span class="n">FOREACH</span><span class="w"> </span><span class="n">kv</span><span class="w"> </span><span class="k">IN</span><span class="w"> </span><span class="nb">ARRAY</span><span class="w"> </span><span class="n">kvs</span><span class="w"> </span><span class="n">LOOP</span> <span class="w"> </span><span class="k">IF</span><span class="w"> </span><span class="n">kv</span><span class="p">.</span><span class="n">k</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">key</span><span class="w"> </span><span class="k">THEN</span><span class="w"> </span><span class="k">RETURN</span><span class="w"> </span><span class="p">(</span><span class="n">kv</span><span class="p">.</span><span class="n">v</span><span class="p">::</span><span class="n">json_token</span><span class="p">).</span><span class="n">value</span><span class="p">;</span><span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="k">IF</span><span class="p">;</span> <span class="w"> </span><span class="k">END</span><span class="w"> </span><span class="n">LOOP</span><span class="p">;</span> <span class="w"> </span><span class="n">RAISE</span><span class="w"> </span><span class="k">EXCEPTION</span><span class="w"> </span><span class="s1">&#39;Key not found.&#39;</span><span class="p">;</span> <span class="k">END</span><span class="p">;</span> <span class="err">$$</span><span class="w"> </span><span class="k">LANGUAGE</span><span class="w"> </span><span class="n">plpgsql</span><span class="p">;</span> </pre></div> <p>And we're all set!</p> <h3 id="testing">Testing</h3><p>Let's try some bad syntax (missing a comma between pairs):</p> <div class="highlight"><pre><span></span>./test.sh<span class="w"> </span>./json.sql<span class="w"> </span><span class="s2">&quot;SELECT json_get(json_from_string(&#39;{\&quot;flubberty\&quot;: 12 \&quot;nice\&quot;: \&quot;a\&quot;}&#39;), &#39;ipo&#39;)&quot;</span> ERROR:<span class="w"> </span>JSON<span class="w"> </span>key-value<span class="w"> </span>pair<span class="w"> </span>must<span class="w"> </span>be<span class="w"> </span>followed<span class="w"> </span>by<span class="w"> </span>a<span class="w"> </span>comma<span class="w"> </span>or<span class="w"> </span>closing<span class="w"> </span>brace,<span class="w"> </span>got:<span class="w"> </span><span class="o">(</span>string,nice<span class="o">)</span>. CONTEXT:<span class="w"> </span>PL/pgSQL<span class="w"> </span><span class="k">function</span><span class="w"> </span>json_parse<span class="o">(</span>json_token<span class="o">[]</span>,integer<span class="o">)</span><span class="w"> </span>line<span class="w"> </span><span class="m">18</span><span class="w"> </span>at<span class="w"> </span>RAISE PL/pgSQL<span class="w"> </span><span class="k">function</span><span class="w"> </span>json_from_string<span class="o">(</span>text<span class="o">)</span><span class="w"> </span>line<span class="w"> </span><span class="m">3</span><span class="w"> </span>at<span class="w"> </span>RETURN </pre></div> <p>Sweet, it fails correctly.</p> <p>Now correct syntax but missing key:</p> <div class="highlight"><pre><span></span>./test.sh<span class="w"> </span>./json.sql<span class="w"> </span><span class="s2">&quot;SELECT json_get(json_from_string(&#39;{\&quot;flubberty\&quot;: 12, \&quot;nice\&quot;: \&quot;a\&quot;}&#39;), &#39;ipo&#39;)&quot;</span> ERROR:<span class="w"> </span>Key<span class="w"> </span>not<span class="w"> </span>found. CONTEXT:<span class="w"> </span>PL/pgSQL<span class="w"> </span><span class="k">function</span><span class="w"> </span>json_get<span class="o">(</span>json_key_value<span class="o">[]</span>,text<span class="o">)</span><span class="w"> </span>line<span class="w"> </span><span class="m">9</span><span class="w"> </span>at<span class="w"> </span>RAISE </pre></div> <p>And finally, correct syntax and existing key:</p> <div class="highlight"><pre><span></span>./test.sh<span class="w"> </span>./json.sql<span class="w"> </span><span class="s2">&quot;SELECT json_get(json_from_string(&#39;{\&quot;flubberty\&quot;: 12, \&quot;nice\&quot;: \&quot;a\&quot;}&#39;), &#39;flubberty&#39;)&quot;</span> <span class="w"> </span>json_get ---------- <span class="w"> </span><span class="m">12</span> <span class="o">(</span><span class="m">1</span><span class="w"> </span>row<span class="o">)</span> </pre></div> <p>Huzzah! Now hopefully PL/pgSQL is a little less scary to you, whether or not you decide to use it.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">For everyone dying to write imperative code in PostgreSQL, I wrote a post about PL/pgSQL 👽 It starts with implementing simple string and array functions, to recursive Fibonacci, to a small JSON parsing library. A nice little language with a great stdlib!<a href="https://t.co/m4Tff99N6R">https://t.co/m4Tff99N6R</a> <a href="https://t.co/2ZMJn2foNa">pic.twitter.com/2ZMJn2foNa</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1452339113131139072?ref_src=twsrc%5Etfw">October 24, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/exploring-plpgsql.htmlSun, 24 Oct 2021 00:00:00 +0000Experimenting with column- and row-oriented datastructureshttp://notes.eatonphil.com/experimenting-with-column-and-row-oriented-datastructures.html<head> <meta http-equiv="refresh" content="4;URL='https://datastation.multiprocess.io/blog/2021-10-18-experimenting-with-column-and-row-oriented-datastructures.html'" /> </head><p>This is an external post of mine. Click <a href="https://datastation.multiprocess.io/blog/2021-10-18-experimenting-with-column-and-row-oriented-datastructures.html">here</a> if you are not redirected.</p> http://notes.eatonphil.com/experimenting-with-column-and-row-oriented-datastructures.htmlMon, 18 Oct 2021 00:00:00 +0000Notes on running Electronhttp://notes.eatonphil.com/notes-on-running-electron.html<head> <meta http-equiv="refresh" content="4;URL='https://datastation.multiprocess.io/blog/2021-10-13-notes-on-running-electron.html'" /> </head><p>This is an external post of mine. Click <a href="https://datastation.multiprocess.io/blog/2021-10-13-notes-on-running-electron.html">here</a> if you are not redirected.</p> http://notes.eatonphil.com/notes-on-running-electron.htmlWed, 13 Oct 2021 00:00:00 +0000Enumerating and analyzing 40+ non-V8 JavaScript implementationshttp://notes.eatonphil.com/javascript-implementations.html<p>V8 is, I'm sure, the most used implementation of JavaScript today. Used in Chrome, (and by extension) Microsoft Edge, Node.js, etc. Safari's JavaScriptCore and Firefox's SpiderMonkey are also contenders for extremely mainstream implementations.</p> <p>But what else is out there? What if I want to embed JavaScript in a C program, or a Go program, or a Rust program, or a Java program(, and so on)? Or what if I want to run JavaScript on a microcontroller? Or use it as a base for language research? It turns out there are many high-quality implementations out there.</p> <p>This post describes a number of them and their implementation choices. I'm not going to cover V8, JavaScriptCore, or SpiderMonkey because they are massive and hide multiple various interpreters and compilers inside. Plus, you already know about them.</p> <p class="note"> I'm going to miss some implementations and get some details wrong. Please <a href="https://twitter.com/phil_eaton">Tweet</a> or <a href="mailto:[email protected]">email</a> me with your corrections! I'd be particularly interested to hear about pure-research; and commercial, closed-source implementations of JavaScript. </p><h3 id="corporate-backed">Corporate-backed</h3><p>These are implementations that would make sense to look into for your own commercial, production applications.</p> <h4 id="on-the-jvm">On the JVM</h4><ul> <li><a href="https://github.com/oracle/graaljs">Oracle's GraalJS</a>: compiles JavaScript to JVM bytecode or GraalVM<ul> <li>Support: Full compatibility with latest ECMAScript specification</li> <li>Implementation language: Java</li> <li>Runtime: <a href="https://www.graalvm.org/">GraalVM</a> or <a href="https://www.graalvm.org/reference-manual/js/RunOnJDK/">stock JDK</a></li> <li>Parser: <a href="https://github.com/oracle/graaljs/blob/master/graal-js/src/com.oracle.js.parser/src/com/oracle/js/parser/Parser.java">Hand-written</a></li> <li>First release: <a href="https://github.com/oracle/graaljs/releases/tag/vm-19.0.0">2019?</a></li> <li>Notes: Replaced Nashorn as the default JavaScript implementation in JDK.</li> </ul> </li> <li><a href="https://github.com/mozilla/rhino">Mozilla's Rhino</a>: interprets and compiles JavaScript to JVM bytecode<ul> <li>Support: ES6</li> <li>Implementation language: Java</li> <li>Runtime: Both <a href="https://github.com/mozilla/rhino/blob/master/src/org/mozilla/javascript/Interpreter.java">interpreted through custom bytecode VM</a> and interpreted <a href="https://github.com/mozilla/rhino/blob/master/src/org/mozilla/javascript/optimizer/Codegen.java">after compiling to JVM bytecode</a> as an optimization</li> <li>Parser: <a href="https://github.com/mozilla/rhino/blob/master/src/org/mozilla/javascript/Parser.java">Hand-written</a></li> <li>First release: <a href="http://udn.realityripple.com/docs/Mozilla/Projects/Rhino/History">1998?</a></li> <li>Notes: Replaced by Nashorn as the default JavaScript engine on the JVM, but remains actively developed.</li> </ul> </li> <li><a href="https://github.com/openjdk/nashorn">Oracle's Nashorn</a>: compiles JavaScript to JVM bytecode<ul> <li>Support: ES5</li> <li>Implementation language: Java</li> <li>Runtime: compiles to <a href="https://github.com/openjdk/nashorn/tree/main/src/org.openjdk.nashorn/share/classes/org/openjdk/nashorn/internal/codegen">JVM bytecode</a></li> <li>Parser: <a href="https://github.com/openjdk/nashorn/blob/main/src/org.openjdk.nashorn/share/classes/org/openjdk/nashorn/internal/parser/Parser.java">Hand-written</a></li> <li>First release: <a href="https://blogs.oracle.com/nashorn/open-for-business">2012?</a></li> <li>Notes: Replaced Rhino as default JavaScript implementation on JVM. Replaced by GraalJS more recently, but remains actively developed.</li> </ul> </li> </ul> <h4 id="embeddable">Embeddable</h4><ul> <li><a href="https://github.com/nginx/njs">Nginx's njs</a><ul> <li>Support: ES5</li> <li>Implementation language: C</li> <li>Runtime: <a href="https://github.com/nginx/njs/blob/master/src/njs_vmcode.c">Bytecode VM</a></li> <li>Parser: <a href="https://github.com/nginx/njs/blob/master/src/njs_parser.c">Hand-written</a></li> </ul> </li> <li><a href="https://mp2.dk/techblog/chowjs/">ChowJS</a>: proprietary AOT compiler based on QuickJS for game developers<ul> <li>Support: everything QuickJS does presumably (see further down for QuickJS)</li> <li>Implementation language: C presumably</li> <li>Runtime: QuickJS's bytecode interpreter but also an AOT compiler</li> <li>Parser: QuickJS's presumably</li> <li>First release: <a href="https://mp2.dk/techblog/chowjs/">2021</a></li> <li>Notes: Code is not available so exact analysis on these points is not possible at the moment.</li> </ul> </li> <li><a href="https://github.com/ccxvii/mujs">Artifex's mujs</a><ul> <li>Support: ES5, probably</li> <li>Implementation language: C</li> <li>Runtime: <a href="https://github.com/ccxvii/mujs/blob/master/jsrun.c">Bytecode stack-oriented VM</a></li> <li>Parser: <a href="https://github.com/ccxvii/mujs/blob/master/jsparse.c">Hand-written</a></li> <li>First release: <a href="https://github.com/ccxvii/mujs/releases/tag/1.0.0">2017?</a></li> <li>Notes: Originally part of MuPDF viewer, but now broken out. Thanks to <a href="https://twitter.com/rwoodsmall">@rwoodsmalljs</a> for mentioning!</li> </ul> </li> </ul> <h4 id="embedded-systems">Embedded Systems</h4><ul> <li><a href="https://github.com/Samsung/escargot">Samsung's Escargot</a><ul> <li>Support: ES2020</li> <li>Implementation language: C++</li> <li>Runtime: <a href="https://github.com/Samsung/escargot/tree/master/src/interpreter">Bytecode VM</a></li> <li>Parser: <a href="https://github.com/Samsung/escargot/tree/master/src/parser">Hand-written</a></li> <li>First release: <a href="https://github.com/Samsung/escargot/graphs/contributors">2017?</a></li> </ul> </li> <li><a href="https://github.com/espruino/Espruino">Espruino</a><ul> <li>Support: parts of ES5, ES6, ES7/8</li> <li>Implementation language: C</li> <li>Runtime: Seems like direct recursive interpreting without an AST/intermediate form</li> <li>Parser: <a href="https://github.com/espruino/Espruino/blob/master/src/jsparse.c">Hand-written</a></li> <li>First release: <a href="https://github.com/espruino/Espruino/releases/tag/BEFORE_FUNCTION_REFACTOR">2012?</a></li> </ul> </li> <li><a href="https://github.com/cesanta/elk">Cesanta's Elk</a><ul> <li>Support: subset of ES6</li> <li>Implementation language: C</li> <li>Runtime: <a href="https://github.com/cesanta/elk/blob/master/elk.c">Direct recursive interpreter without AST or bytecode VM</a></li> <li>Parser: <a href="https://github.com/cesanta/elk/blob/master/elk.c">Hand-written</a></li> <li>First release: <a href="https://github.com/cesanta/elk/releases/tag/0.0.1">2019?</a></li> <li>Notes: It does all of this with a GC and FFI in &lt;1400 lines of readable C code. Damn.</li> </ul> </li> <li><a href="https://github.com/cesanta/mjs">Cesanta's mJS</a><ul> <li>Support: subset of ES6</li> <li>Implementation language: C</li> <li>Runtime: <a href="https://github.com/cesanta/mjs/blob/master/mjs.c#L3411">Bytecode VM</a></li> <li>Parser: <a href="https://github.com/cesanta/mjs/blob/master/mjs.c#L12780">Hand-written</a></li> <li>First release: <a href="https://github.com/cesanta/mjs/releases/tag/1.5">2017?</a></li> </ul> </li> <li><a href="https://github.com/Moddable-OpenSource/moddable/blob/public/xs/sources/xsSyntaxical.c">Moddable's XS</a><ul> <li>Support: ES2018</li> <li>Implementation language: C</li> <li>Runtime: <a href="https://github.com/Moddable-OpenSource/moddable/blob/public/xs/sources/xsRun.c">Bytecode VM</a></li> <li>Parser: <a href="https://github.com/Moddable-OpenSource/moddable/blob/public/xs/sources/xsSyntaxical.c">Hand-written</a></li> <li>First release: <a href="https://www.moddable.com/XS7-TC-39">2017?</a></li> <li>Notes: More details at <a href="https://www.moddable.com/XS7-TC-39">https://www.moddable.com/XS7-TC-39</a> and <a href="https://www.moddable.com/faq#what-is-xs">https://www.moddable.com/faq#what-is-xs</a>.</li> </ul> </li> </ul> <h4 id="other">Other</h4><ul> <li><a href="https://github.com/facebook/hermes">Facebook's Hermes</a><ul> <li>Support: ES6 <a href="https://hermesengine.dev/docs/language-features">with exceptions</a></li> <li>Implementation language: C++</li> <li>Runtime: <a href="https://github.com/facebook/hermes/tree/main/lib/VM">Bytecode VM</a></li> <li>Parser: <a href="https://github.com/facebook/hermes/blob/main/lib/Parser/JSParserImpl.cpp">Hand-written</a></li> <li>First release: <a href="https://github.com/facebook/hermes/releases/tag/v0.0.1">2019?</a></li> </ul> </li> <li><a href="https://github.com/qt/qtdeclarative/tree/dev/src/qml/jsruntime">Qt's V4</a><ul> <li>Support: ES5</li> <li>Implementation language: C++</li> <li>Runtime: <a href="https://github.com/qt/qtdeclarative/blob/dev/src/qml/jsruntime/qv4vme_moth.cpp">Bytecode VM</a> and JIT compiler</li> <li>Parser: <a href="https://github.com/qt/qtdeclarative/blob/dev/src/qml/parser/qqmljs.g">qlalr custom parser generator</a></li> <li>First release: 2013</li> <li>Notes: Unclear if can be run standalone outside of Qt.</li> </ul> </li> </ul> <p>I don't know whether to put Microsoft's ChakraCore into this list or the next. So I'll put it here but note that as of this year 2021, they are transitioning it to become a community-driven project.</p> <ul> <li><a href="https://github.com/chakra-core/ChakraCore">Microsoft's ChakraCore</a><ul> <li>Support: ES6, probably more</li> <li>Implementation language: C++</li> <li>Runtime: <a href="https://github.com/chakra-core/ChakraCore/tree/master/lib/Backend">Bytecode VM and JIT on x86/ARM</a></li> <li>Parser: <a href="https://github.com/chakra-core/ChakraCore/blob/master/lib/Parser/Parse.cpp">Hand-written</a></li> <li>First release: 2015?</li> </ul> </li> </ul> <h3 id="mature,-community-driven">Mature, community-driven</h3><p>Implementations toward the top are more reliable and proven. Implementations toward the bottom less so.</p> <p>If you are a looking to get involved in language development, the implementation further down on the list can be a great place to start since they typically need work in documentation, testing, and language features.</p> <ul> <li><a href="https://github.com/bellard/quickjs">Fabrice Bellard's QuickJS</a><ul> <li>Support: ES2020</li> <li>Implementation language: C</li> <li>Runtime: <a href="https://raw.githubusercontent.com/bellard/quickjs/master/quickjs.c">Bytecode VM</a> (this is a single large file)</li> <li>Parser: <a href="https://raw.githubusercontent.com/bellard/quickjs/master/quickjs.c">Hand-written</a> (this is a single large file)</li> <li>First release: <a href="https://github.com/bellard/quickjs/commit/91459fb6723e29e923380cec0023af93819ae69d#diff-ead07c84baac57a9542f388a07a2a5209456ce790b04251bc9bd7d179ea85cb1R84">2019</a></li> </ul> </li> <li><a href="https://github.com/svaarala/duktape">DuktapeJS</a><ul> <li>Support: ES5, some parts of ES6/ES7</li> <li>Implementation language: C</li> <li>Runtime: <a href="https://github.com/svaarala/duktape/blob/master/src-input/duk_js_executor.c">Bytecode VM</a></li> <li>Parser: <a href="https://github.com/svaarala/duktape/blob/master/src-input/duk_js_compiler.c">Hand-written</a>, notably with no AST. It just directly compiles to its own bytecode.</li> <li>First release: <a href="https://duktape.org/download.html">2013</a></li> </ul> </li> <li><a href="https://github.com/engine262/engine262">engine262</a><ul> <li>Support: 100% spec compliance</li> <li>Implementation language: JavaScript</li> <li>Runtime: <a href="https://github.com/engine262/engine262/blob/14f50592362d889289e133fff4200e8e304c995a/src/runtime-semantics/IfStatement.mjs">AST interpreter</a></li> <li>Parser: <a href="https://github.com/engine262/engine262/blob/main/src/parser/ExpressionParser.mjs">Hand-written</a></li> </ul> </li> <li><a href="https://github.com/jerryscript-project/jerryscript">JerryScript</a><ul> <li>Support: ES5</li> <li>Implementation language: C</li> <li>Runtime: <a href="https://github.com/jerryscript-project/jerryscript/blob/master/jerry-core/vm/vm.c">Bytecode VM</a></li> <li>Parser: <a href="https://github.com/jerryscript-project/jerryscript/blob/master/jerry-core/parser/js/js-parser.c">Hand-written</a></li> <li>First release: <a href="https://github.com/jerryscript-project/jerryscript/releases/tag/v1.0">2016?</a></li> </ul> </li> <li><a href="https://github.com/SerenityOS/serenity/tree/master/Userland/Libraries/LibJS">Serenity's LibJS</a><ul> <li>Support: <a href="https://libjs.dev/test262/">Progressing toward compliance</a></li> <li>Implementation language: C++</li> <li>Runtime: <a href="https://github.com/SerenityOS/serenity/tree/master/Userland/Libraries/LibJS/Bytecode">Bytecode VM</a></li> <li>Parser: <a href="https://github.com/SerenityOS/serenity/blob/master/Userland/Libraries/LibJS/Parser.cpp">Hand-written</a></li> <li>Notes: Might also work outside of Serenity but documentation on building/running it on Linux is hard to find.</li> </ul> </li> <li><a href="https://github.com/dop251/goja">goja</a>: JavaScript interpreter for Go<ul> <li>Support: ES5</li> <li>Implementation language: Go</li> <li>Runtime: <a href="https://github.com/dop251/goja/blob/master/vm.go">Bytecode VM</a></li> <li>Parser: <a href="https://github.com/dop251/goja/blob/master/parser/statement.go">Hand-written</a></li> <li>First release: <a href="https://github.com/dop251/goja/graphs/contributors">2017?</a></li> </ul> </li> <li><a href="https://github.com/robertkrimen/otto">otto</a>: JavaScript interpreter for Go<ul> <li>Support: ES5</li> <li>Implementation language: Go</li> <li>Runtime: <a href="https://github.com/robertkrimen/otto/blob/373ff54384526e8336b5b597619d0923a4a83ae0/cmpl_evaluate_expression.go#L183">AST interpreter</a></li> <li>Parser: <a href="https://github.com/robertkrimen/otto/blob/master/parser/statement.go">Hand-written</a></li> <li>First release: <a href="https://github.com/robertkrimen/otto/graphs/contributors">2012?</a></li> <li>Notes: The AST interpreter-only implementation might suggest this implementation is slower than Goja. I don't have benchmarks for that.</li> </ul> </li> <li><a href="https://github.com/paulbartrum/jurassic">Jurassic</a>: JavaScript parser and interpreter for .NET<ul> <li>Support: ES5</li> <li>Implementation language: C#</li> <li>Runtime: Compiles to <a href="https://github.com/paulbartrum/jurassic/blob/ee6f4fa17e6205e15412a214b24d7575b0bd461c/Jurassic/Compiler/MethodGenerator/GlobalOrEvalMethodGenerator.cs#L139">.NET</a></li> <li>Parser: <a href="https://github.com/paulbartrum/jurassic/blob/master/Jurassic/Compiler/Parser/Parser.cs">Hand-written</a></li> <li>First release: <a href="https://github.com/paulbartrum/jurassic/graphs/contributors">2011?</a></li> </ul> </li> <li><a href="https://github.com/sebastienros/jint">Jint</a><ul> <li>Support: ES5, most of ES6/7/8</li> <li>Implementation language: C#</li> <li>Runtime: <a href="https://github.com/sebastienros/jint/blob/main/Jint/Runtime/Interpreter/Expressions/JintUnaryExpression.cs">AST interpreter</a></li> <li>Parser: <a href="https://github.com/sebastienros/esprima-dotnet/blob/main/src/Esprima/JavascriptParser.cs">Hand-written via Esprima.NET</a></li> <li>First release: <a href="https://github.com/sebastienros/jint/graphs/contributors">2014?</a></li> <li>Notes: Thanks <a href="https://news.ycombinator.com/user?id=fowl2">fowl2</a> for mentioning!</li> </ul> </li> <li><a href="https://github.com/nilproject/NiL.JS">NiL.JS</a><ul> <li>Support: ES6</li> <li>Implementation language: C#</li> <li>Runtime: <a href="https://github.com/nilproject/NiL.JS/blob/develop/NiL.JS/Expressions/Assignment.cs">AST interpreter</a></li> <li>Parser: <a href="https://github.com/nilproject/NiL.JS/blob/develop/NiL.JS/Core/Parser.cs">Hand-written</a></li> <li>First release: <a href="https://github.com/nilproject/NiL.JS/graphs/contributors">2014?</a></li> </ul> </li> <li><a href="https://github.com/NeilFraser/JS-Interpreter">Neil Fraser's JS-Interpreter</a><ul> <li>Support: ES5</li> <li>Implementation language: JavaScript</li> <li>Runtime: <a href="https://github.com/NeilFraser/JS-Interpreter/blob/master/interpreter.js">AST interpreter</a></li> <li>Parser: <a href="https://github.com/NeilFraser/JS-Interpreter/blob/master/acorn.js">Hand-written, uses Acorn</a></li> <li>First release: <a href="https://github.com/NeilFraser/JS-Interpreter/graphs/contributors">2014?</a></li> </ul> </li> <li><a href="https://github.com/BeRo1985/besen">BESEN</a>: Bytecode VM and JIT compiler in Object Pascal<ul> <li>Support: ES5</li> <li>Implementation language: Object Pascal</li> <li>Runtime: <a href="https://github.com/BeRo1985/besen/blob/master/src/BESENCode.pas">Bytecode VM</a> with <a href="https://github.com/BeRo1985/besen/blob/master/src/BESENCodeJITx86.pas">JIT for x86</a> and <a href="https://github.com/BeRo1985/besen/blob/master/src/BESENCodeJITx64.pas">x86_64</a></li> <li>Parser: <a href="https://github.com/BeRo1985/besen/blob/master/src/BESENParser.pas">Hand-written</a></li> <li>First release: <a href="https://github.com/BeRo1985/besen/graphs/contributors">2015?</a></li> </ul> </li> </ul> <p>These last few are not toys but they are also more experimental or, in AssemblyScript's case, not JavaScript.</p> <ul> <li><a href="https://github.com/boa-dev/boa">boa</a>: JavaScript interpreter for Rust<ul> <li>Support: <a href="https://boa-dev.github.io/boa/test262/">Unclear</a></li> <li>Implementation language: Rust</li> <li>Runtime: <a href="https://github.com/boa-dev/boa/tree/master/boa/src/vm">Bytecode VM</a></li> <li>Parser: <a href="https://github.com/boa-dev/boa/tree/master/boa/src/syntax/parser">Hand-written</a></li> <li>First release: <a href="https://github.com/boa-dev/boa/releases/tag/v0.2.0">2019?</a></li> </ul> </li> <li><a href="https://github.com/AssemblyScript/assemblyscript">AssemblyScript</a><ul> <li>Support: Subset of TypeScript</li> <li>Implementation language: AssemblyScript subset of TypeScript</li> <li>Runtime: <a href="https://github.com/AssemblyScript/assemblyscript/blob/main/src/compiler.ts">webassembly</a></li> <li>Parser: <a href="https://github.com/AssemblyScript/assemblyscript/blob/main/src/parser.ts">Hand-written</a></li> </ul> </li> <li><a href="https://github.com/nickmain/kawa-scheme/tree/master/gnu/ecmascript">JavaScript in Kawa Scheme</a></li> <li><a href="https://wingolog.org/archives/2009/02/22/ecmascript-for-guile">JavaScript in GNU Guile Scheme</a></li> <li><a href="https://github.com/ReevaJS/reeva">ReevaJS</a><ul> <li>Support: ES5 (with exceptions)</li> <li>Implementation language: Kotlin</li> <li>Runtime: <a href="https://github.com/ReevaJS/reeva/blob/master/src/main/kotlin/com/reevajs/reeva/interpreter/Interpreter.kt">Stack machine</a></li> <li>Parser: <a href="https://github.com/ReevaJS/reeva/blob/master/src/main/kotlin/com/reevajs/reeva/parsing/Parser.kt">Hand-written</a></li> </ul> </li> </ul> <h3 id="research-implementations">Research Implementations</h3><ul> <li><a href="https://github.com/higgsjs/Higgs">Higgs</a><ul> <li>Support: Unclear</li> <li>Implementation language: D</li> <li>Runtime: <a href="https://github.com/higgsjs/Higgs/blob/master/source/runtime/vm.d">Bytecode VM</a> and <a href="https://github.com/higgsjs/Higgs/tree/master/source/jit">JIT compiler on x64</a></li> <li>Parser: <a href="https://github.com/higgsjs/Higgs/blob/master/source/parser/parser.d">Hand-written</a></li> </ul> </li> <li><a href="https://github.com/tugawa/ejs-new">eJS</a><ul> <li>Support: Unclear</li> <li>Implementation language: Java</li> <li>Runtime: Bytecode VM</li> <li>Parser: ANTLR</li> <li>Notes: eJS is a framework to generate JavaScript VMs that are specialised for applications.</li> </ul> </li> <li><a href="https://github.com/endojs/Jessie">Jessie</a>: safe subset of JavaScript non-exploitable smart contracts<ul> <li>Support: some subset of ES2017</li> <li>???</li> <li>See <a href="https://github.com/agoric-labs/jessica">https://github.com/agoric-labs/jessica</a> for more info.</li> </ul> </li> <li><a href="https://github.com/b9org/b9">https://github.com/b9org/b9</a></li> <li><a href="https://www.defensivejs.com/">https://www.defensivejs.com/</a></li> </ul> <p class="note"> Thanks to <a href="https://twitter.com/smarr">@smarr</a> for contributing eJS, Higgs, and b9! </p><h3 id="notable-abandoned">Notable Abandoned</h3><ul> <li><a href="https://github.com/DigitalMars/DMDScript">DMDScript</a><ul> <li>Support: Unclear</li> <li>Implementation language: D</li> <li>Runtime: <a href="https://github.com/DigitalMars/DMDScript/blob/master/engine/source/dmdscript/opcodes.d#L15">Bytecode VM</a></li> <li>Parser: <a href="https://github.com/DigitalMars/DMDScript/blob/master/engine/source/dmdscript/parse.d">Hand-written</a></li> <li>Notes: It's possible this is commercially maintained by DigitalMars but I'm not sure. There are also references in this repo to another C++ implementation of DMDScript that may be commercial. Thanks to <a href="https://twitter.com/moon_chilled">@moon_chilled</a> for mentioning!</li> </ul> </li> <li><a href="https://github.com/toshok/echojs">EchoJS</a><ul> <li>Support: Unclear</li> <li>Implementation language: JavaScript</li> <li>Runtime: Native through LLVM</li> <li>Parser: <a href="https://github.com/toshok/esprima/tree/e4445c9cc2530d672c4e9f68f5e2a53673b57af0">Hand-written via Esprima</a></li> </ul> </li> <li><a href="https://github.com/haileys/twostroke">twostroke</a><ul> <li>Support: Unclear</li> <li>Implementation language: Ruby</li> <li>Runtime: <a href="https://github.com/haileys/twostroke/blob/master/lib/twostroke/runtime/vm_frame.rb">Bytecode VM</a></li> <li>Parser: <a href="https://github.com/haileys/twostroke/blob/master/lib/twostroke/parser.rb">Hand-written</a></li> </ul> </li> <li><a href="https://github.com/progval/rpython-langjs">PyPy-JS</a><ul> <li>Support: Unclear</li> <li>Implementation language: RPython</li> <li>Runtime: <a href="https://github.com/progval/rpython-langjs/blob/master/js/jscode.py">RPython</a></li> <li>Parser: <a href="https://github.com/progval/rpython-langjs/blob/master/js/jsgrammar.txt">EBNF parser generator</a></li> </ul> </li> <li><a href="https://github.com/jterrace/js.js/">js.js</a><ul> <li>Support: Unclear</li> <li>Implementation language: JavaScript</li> <li>Runtime: Too scared to look at the gigantic files in this repo.</li> <li>Parser: Ditto.</li> </ul> </li> <li><a href="https://github.com/fholm/IronJS">IronJS</a><ul> <li>Support: ES3</li> <li>Implementation language: F#</li> <li>Runtime: .NET through <a href="https://docs.microsoft.com/en-us/dotnet/framework/reflection-and-codedom/dynamic-language-runtime-overview">DLR</a>, I think.</li> <li>Parser: <a href="https://github.com/fholm/IronJS/blob/master/Src/IronJS/Compiler.Parser.fs">Hand-written</a></li> </ul> </li> <li><a href="https://github.com/polydojo/jispy">jispy</a><ul> <li>Support: Unclear</li> <li>Implementation language: Python</li> <li>Runtime: <a href="https://github.com/polydojo/jispy/blob/master/jispy.py#L730">AST interpreter</a></li> <li>Parser: <a href="https://github.com/polydojo/jispy/blob/master/jispy.py#L311">Unclear</a></li> </ul> </li> <li><a href="https://metacpan.org/pod/JE#Simple-Use">JE: Pure-Perl JavaScript Engine</a></li> <li><a href="https://docs.racket-lang.org/javascript/index.html">Dave Herman's JavaScript for PLT Scheme</a></li> </ul> <h3 id="notable-toy-implementations">Notable toy implementations</h3><p>Great for inspiriration if you've never implemented a language before.</p> <ul> <li><a href="https://github.com/timruffles/js-to-c">js-to-c</a>: A JavaScript to C compiler, written in C</li> <li><a href="https://github.com/mras0/mjs">mjs</a>: AST interpreter for not just ES5 or even ES3 but also ES1</li> <li><a href="https://github.com/gojisvm/gojis">gojis</a>: AST interpreter in Go</li> <li><a href="https://github.com/DelSkayn/toyjs">tojs</a>: Bytecode VM in Rust</li> <li><a href="https://github.com/CrimsonAS/v2">v2</a>: Bytecode VM in Go</li> <li><a href="https://github.com/githubyang/SparrowJS">SparrowJS</a>: AST interpreter in C++</li> <li><a href="https://github.com/eatonphil/jsc">jsc</a>: My own experiment compiling JavaScript to C++/libV8</li> </ul> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">New post is up! Enumerating and analyzing 40+ non-V8 JavaScript implementations; of course with links to source code and parser &amp; runtime/backend decisions.<br><br>I hope you enjoy learning about JavaScript engines as much as I did. 😁<a href="https://t.co/dEX06WU38f">https://t.co/dEX06WU38f</a> <a href="https://t.co/AoYScphG6m">pic.twitter.com/AoYScphG6m</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1440436962305789952?ref_src=twsrc%5Etfw">September 21, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/javascript-implementations.htmlTue, 21 Sep 2021 00:00:00 +0000Writing a simple JSON library from scratch: a tour through modern C++http://notes.eatonphil.com/writing-a-simple-json-library-in-modern-cpp.html<p>Modern C++ has a lot of cool features. Move semantics means passing around structs in functions is cheap. <code>std::shared_ptr</code> means I don't have to manage any memory; no more <code>new</code>/<code>delete</code>! (But try as I might to understand <code>std::unique_ptr</code>, I'm just not there yet.)</p> <p>The syntax has also gotten some treatment with <code>auto</code> and tuple destructuring.</p> <p>In order to test out this modern C++ I wanted a small but meaningful project that operates on very dynamic data. The two that always come to mind are JSON parsers or Lisp interpreters.</p> <p>This post walks through writing a basic JSON library from scratch using only the standard library. The source code for the resulting library is available <a href="https://github.com/eatonphil/cpp-json">on Github</a>.</p> <p>The biggest simplification we'll make is that rather than full JSON numbers, we'll only allow integers.</p> <p class="note"> Big caveat! I couldn't be farther from a C++ expert! Email or tweet me as you see mistakes, madness, lies. </p><h3 id="api">API</h3><p>The two big parts of the API will be about lexing (turning a string into an array of tokens) and parsing (turning an array of tokens into a JSON object-tree). A better implementation would implement the lexer as taking a character stream rather than a string, but taking a string is simpler. So we'll stick with that.</p> <p>Both of these functions can fail so we'll return a tuple in both cases with a string containing a possibly blank error message.</p> <p>We will define the header in <code>./include/json.hpp</code>.</p> <div class="highlight"><pre><span></span><span class="cp">#ifndef JSON_H</span> <span class="cp">#define JSON_H</span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;tuple&gt;</span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;vector&gt;</span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;string&gt;</span> <span class="k">namespace</span><span class="w"> </span><span class="nn">json</span><span class="w"> </span><span class="p">{</span> <span class="n">std</span><span class="o">::</span><span class="n">tuple</span><span class="o">&lt;</span><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o">&lt;</span><span class="n">JSONToken</span><span class="o">&gt;</span><span class="p">,</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="o">&gt;</span><span class="w"> </span><span class="n">lex</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="p">);</span> <span class="n">std</span><span class="o">::</span><span class="n">tuple</span><span class="o">&lt;</span><span class="n">JSONValue</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="o">&gt;</span><span class="w"> </span><span class="n">parse</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o">&lt;</span><span class="n">JSONToken</span><span class="o">&gt;</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span> <span class="p">}</span><span class="w"> </span><span class="c1">// namespace json</span> <span class="cp">#endif</span> </pre></div> <p>The token returned by <code>lex</code> will need to contain the token's string value, the location (offset) in the original source, a pointer to the full source (for debugging), and the token's type. The token type itself will be an enum of either string, number, syntax (colon, bracket, etc.), boolean, or null.</p> <div class="highlight"><pre><span></span><span class="p">...</span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;string&gt;</span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;memory&gt;</span> <span class="k">namespace</span><span class="w"> </span><span class="nn">json</span><span class="w"> </span><span class="p">{</span> <span class="k">enum</span><span class="w"> </span><span class="k">class</span><span class="w"> </span><span class="nc">JSONTokenType</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">String</span><span class="p">,</span><span class="w"> </span><span class="n">Number</span><span class="p">,</span><span class="w"> </span><span class="n">Syntax</span><span class="p">,</span><span class="w"> </span><span class="n">Boolean</span><span class="p">,</span><span class="w"> </span><span class="n">Null</span><span class="w"> </span><span class="p">};</span> <span class="k">struct</span><span class="w"> </span><span class="nc">JSONToken</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">value</span><span class="p">;</span> <span class="w"> </span><span class="n">JSONTokenType</span><span class="w"> </span><span class="n">type</span><span class="p">;</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">location</span><span class="p">;</span> <span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">shared_ptr</span><span class="o">&lt;</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="o">&gt;</span><span class="w"> </span><span class="n">full_source</span><span class="p">;</span> <span class="p">};</span> <span class="p">...</span> <span class="p">}</span><span class="w"> </span><span class="c1">// namespace json</span> <span class="p">...</span> </pre></div> <p>This is the only place in the entire code we'll pass around a pointer. Using <code>std::shared_ptr</code> means we don't have to do any manual memory management either. No <code>new</code> or <code>delete</code>.</p> <p>Next, <code>JSONValue</code> is a struct containing optional string, boolean, number, array, and object fields with a type num to differentiate.</p> <div class="highlight"><pre><span></span><span class="p">...</span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;map&gt;</span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;optional&gt;</span> <span class="k">namespace</span><span class="w"> </span><span class="nn">json</span><span class="w"> </span><span class="p">{</span> <span class="k">enum</span><span class="w"> </span><span class="k">class</span><span class="w"> </span><span class="nc">JSONValueType</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">String</span><span class="p">,</span><span class="w"> </span><span class="n">Number</span><span class="p">,</span><span class="w"> </span><span class="n">Object</span><span class="p">,</span><span class="w"> </span><span class="n">Array</span><span class="p">,</span><span class="w"> </span><span class="n">Boolean</span><span class="p">,</span><span class="w"> </span><span class="n">Null</span><span class="w"> </span><span class="p">};</span> <span class="k">struct</span><span class="w"> </span><span class="nc">JSONValue</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">optional</span><span class="o">&lt;</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="o">&gt;</span><span class="w"> </span><span class="n">string</span><span class="p">;</span> <span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">optional</span><span class="o">&lt;</span><span class="kt">double</span><span class="o">&gt;</span><span class="w"> </span><span class="n">number</span><span class="p">;</span> <span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">optional</span><span class="o">&lt;</span><span class="kt">bool</span><span class="o">&gt;</span><span class="w"> </span><span class="n">boolean</span><span class="p">;</span> <span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">optional</span><span class="o">&lt;</span><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o">&lt;</span><span class="n">JSONValue</span><span class="o">&gt;&gt;</span><span class="w"> </span><span class="n">array</span><span class="p">;</span> <span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">optional</span><span class="o">&lt;</span><span class="n">std</span><span class="o">::</span><span class="n">map</span><span class="o">&lt;</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="p">,</span><span class="w"> </span><span class="n">JSONValue</span><span class="o">&gt;&gt;</span><span class="w"> </span><span class="n">object</span><span class="p">;</span> <span class="w"> </span><span class="n">JSONValueType</span><span class="w"> </span><span class="n">type</span><span class="p">;</span> <span class="p">};</span> <span class="k">enum</span><span class="w"> </span><span class="k">class</span><span class="w"> </span><span class="nc">JSONTokenType</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">String</span><span class="p">,</span><span class="w"> </span><span class="n">Number</span><span class="p">,</span><span class="w"> </span><span class="n">Syntax</span><span class="p">,</span><span class="w"> </span><span class="n">Boolean</span><span class="p">,</span><span class="w"> </span><span class="n">Null</span><span class="w"> </span><span class="p">};</span> <span class="p">...</span> </pre></div> <p>Thanks to <code>std::optional</code> we can avoid using pointers to describe these fields. I did take a look at <code>std::variant</code> but it seemed like its API was overly complex.</p> <p>Finally, we'll add two more functions: a high level <code>parse</code> function that combines the job of lexing and parsing, and a <code>deparse</code> function for printing a <code>JSONValue</code> as a JSON string.</p> <div class="highlight"><pre><span></span><span class="p">...</span> <span class="n">std</span><span class="o">::</span><span class="n">tuple</span><span class="o">&lt;</span><span class="n">JSONValue</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="o">&gt;</span><span class="w"> </span><span class="n">parse</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o">&lt;</span><span class="n">JSONToken</span><span class="o">&gt;</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span> <span class="n">std</span><span class="o">::</span><span class="n">tuple</span><span class="o">&lt;</span><span class="n">JSONValue</span><span class="p">,</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="o">&gt;</span><span class="w"> </span><span class="n">parse</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="p">);</span> <span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">deparse</span><span class="p">(</span><span class="n">JSONValue</span><span class="p">,</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">whitespace</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">);</span> <span class="p">}</span><span class="w"> </span><span class="c1">// namespace json</span> <span class="p">...</span> </pre></div> <p>Now we're ready to start on the implementation.</p> <h3 id="lexing">Lexing</h3><p>First up is lexing; turning a JSON string into an array of tokens: a number, string, null keyword, boolean keyword, or syntax like comma or colon.</p> <p>The main lex loop skips whitespace and calls helper functions for each kind of token. If a token is found, we accumulate it and move to the end of that token (some tokens like <code>:</code> are a single character, some tokens like <code>"my great string"</code> are multiple characters.)</p> <p>Each token we find gets a pointer to the original JSON source for use in error messages if parsing fails. Again this will be the only time we explicitly pass around pointers in this implementation. We don't do any manual management because we're going to use <code>std::shared_ptr</code>.</p> <div class="highlight"><pre><span></span><span class="cp">#include</span><span class="w"> </span><span class="cpf">&quot;json.hpp&quot;</span> <span class="k">namespace</span><span class="w"> </span><span class="nn">json</span><span class="w"> </span><span class="p">{</span> <span class="n">std</span><span class="o">::</span><span class="n">tuple</span><span class="o">&lt;</span><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o">&lt;</span><span class="n">JSONToken</span><span class="o">&gt;</span><span class="p">,</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="o">&gt;</span><span class="w"> </span><span class="n">lex</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">raw_json</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o">&lt;</span><span class="n">JSONToken</span><span class="o">&gt;</span><span class="w"> </span><span class="n">tokens</span><span class="p">;</span> <span class="w"> </span><span class="c1">// All tokens will embed a pointer to the raw JSON for debugging purposes</span> <span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">original_copy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">make_shared</span><span class="o">&lt;</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="o">&gt;</span><span class="p">(</span><span class="n">raw_json</span><span class="p">);</span> <span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">generic_lexers</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="n">lex_syntax</span><span class="p">,</span><span class="w"> </span><span class="n">lex_string</span><span class="p">,</span><span class="w"> </span><span class="n">lex_number</span><span class="p">,</span><span class="w"> </span><span class="n">lex_null</span><span class="p">,</span><span class="w"> </span><span class="n">lex_true</span><span class="p">,</span><span class="w"> </span><span class="n">lex_false</span><span class="p">};</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">raw_json</span><span class="p">.</span><span class="n">length</span><span class="p">();</span><span class="w"> </span><span class="n">i</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Skip past whitespace</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="k">auto</span><span class="w"> </span><span class="n">new_index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">lex_whitespace</span><span class="p">(</span><span class="n">raw_json</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">);</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">new_index</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">new_index</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="k">continue</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">found</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">false</span><span class="p">;</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="k">auto</span><span class="w"> </span><span class="n">lexer</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="n">generic_lexers</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="k">auto</span><span class="w"> </span><span class="p">[</span><span class="n">token</span><span class="p">,</span><span class="w"> </span><span class="n">new_index</span><span class="p">,</span><span class="w"> </span><span class="n">error</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">lexer</span><span class="p">(</span><span class="n">raw_json</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">);</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">new_index</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Error while lexing, return early</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">error</span><span class="p">.</span><span class="n">length</span><span class="p">())</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{{},</span><span class="w"> </span><span class="n">error</span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Store reference to the original source</span> <span class="w"> </span><span class="n">token</span><span class="p">.</span><span class="n">full_source</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">original_copy</span><span class="p">;</span> <span class="w"> </span><span class="n">tokens</span><span class="p">.</span><span class="n">push_back</span><span class="p">(</span><span class="n">token</span><span class="p">);</span> <span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">new_index</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="n">found</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">true</span><span class="p">;</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">found</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">continue</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{{},</span><span class="w"> </span><span class="n">format_error</span><span class="p">(</span><span class="s">&quot;Unable to lex&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">raw_json</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="p">)};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">};</span> <span class="p">}</span> <span class="p">}</span><span class="w"> </span><span class="c1">// namespace json</span> </pre></div> <p>Two neat things you'll notice in there are tuple literal syntax (<code>{tokens, ""}</code>) and how easy it is to type a value containing an array of function pointers using auto (<code>generic_lexers</code>).</p> <h4 id="format_error">format_error</h4><p>Since we referenced <code>format_error</code>, let's define it. This needs to accept a message prefix, the full JSON string, and the index offset where the error should point to.</p> <p>Inside the function we'll iterate over the string until we find the entire line containing this index offset. We'll display that line and a pointer to the character that is causing/starting the error.</p> <div class="highlight"><pre><span></span><span class="p">...</span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;sstream&gt;</span> <span class="k">namespace</span><span class="w"> </span><span class="nn">json</span><span class="w"> </span><span class="p">{</span> <span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="nf">format_error</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">base</span><span class="p">,</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">source</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">index</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">ostringstream</span><span class="w"> </span><span class="n">s</span><span class="p">;</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">counter</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">column</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">lastline</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">;</span> <span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">whitespace</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">;</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="k">auto</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="n">source</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">counter</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">index</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;\n&#39;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">line</span><span class="o">++</span><span class="p">;</span> <span class="w"> </span><span class="n">column</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="n">lastline</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">;</span> <span class="w"> </span><span class="n">whitespace</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;\t&#39;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">column</span><span class="o">++</span><span class="p">;</span> <span class="w"> </span><span class="n">lastline</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="s">&quot; &quot;</span><span class="p">;</span> <span class="w"> </span><span class="n">whitespace</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="s">&quot; &quot;</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">column</span><span class="o">++</span><span class="p">;</span> <span class="w"> </span><span class="n">lastline</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">c</span><span class="p">;</span> <span class="w"> </span><span class="n">whitespace</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="s">&quot; &quot;</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">counter</span><span class="o">++</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Continue accumulating the lastline for debugging</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">counter</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">source</span><span class="p">.</span><span class="n">size</span><span class="p">())</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">source</span><span class="p">[</span><span class="n">counter</span><span class="p">];</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;\n&#39;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">lastline</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">c</span><span class="p">;</span> <span class="w"> </span><span class="n">counter</span><span class="o">++</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="o">&lt;&lt;</span><span class="w"> </span><span class="n">base</span><span class="w"> </span><span class="o">&lt;&lt;</span><span class="w"> </span><span class="s">&quot; at line &quot;</span><span class="w"> </span><span class="o">&lt;&lt;</span><span class="w"> </span><span class="n">line</span><span class="w"> </span><span class="o">&lt;&lt;</span><span class="w"> </span><span class="s">&quot;, column &quot;</span><span class="w"> </span><span class="o">&lt;&lt;</span><span class="w"> </span><span class="n">column</span><span class="w"> </span><span class="o">&lt;&lt;</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span> <span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="o">&lt;&lt;</span><span class="w"> </span><span class="n">lastline</span><span class="w"> </span><span class="o">&lt;&lt;</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span> <span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="o">&lt;&lt;</span><span class="w"> </span><span class="n">whitespace</span><span class="w"> </span><span class="o">&lt;&lt;</span><span class="w"> </span><span class="s">&quot;^&quot;</span><span class="p">;</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">s</span><span class="p">.</span><span class="n">str</span><span class="p">();</span> <span class="p">}</span> <span class="p">...</span> </pre></div> <p>The <code>printf</code> API is annoying and Clang 12 (latest Clang on latest Fedora) doesn't seem to support <code>std::format</code>. So we just use <code>std::sstream</code> to do string "formatting".</p> <p>But ok, back to lexing! Next up: whitespace.</p> <h4 id="lex_whitespace">lex_whitespace</h4><p>This function's job is to skip past whitespace. Thankfully we've got <code>std::isspace</code> to help.</p> <div class="highlight"><pre><span></span><span class="kt">int</span><span class="w"> </span><span class="nf">lex_whitespace</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">raw_json</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">index</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">isspace</span><span class="p">(</span><span class="n">raw_json</span><span class="p">[</span><span class="n">index</span><span class="p">]))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">index</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">raw_json</span><span class="p">.</span><span class="n">length</span><span class="p">())</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">index</span><span class="o">++</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">index</span><span class="p">;</span> <span class="p">}</span> </pre></div> <p>It's very simple!</p> <h4 id="lex_syntax">lex_syntax</h4><p>All of the generic lexers follow the same pattern. They return either a valid token and the index where the token ends, or they return an error string.</p> <p>Since all the syntax elements in JSON (<code>,</code>, <code>:</code>, <code>{</code>, <code>}</code>, <code>[</code> and , <code>]</code>) are single characters, we don't need to write a "longest substring" helper function. We simply check if the current character is one of these characters and return a syntax token if so.</p> <div class="highlight"><pre><span></span><span class="n">std</span><span class="o">::</span><span class="n">tuple</span><span class="o">&lt;</span><span class="n">JSONToken</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="o">&gt;</span><span class="w"> </span><span class="n">lex_syntax</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">raw_json</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">index</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">JSONToken</span><span class="w"> </span><span class="n">token</span><span class="p">{</span><span class="s">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">JSONTokenType</span><span class="o">::</span><span class="n">Syntax</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">};</span> <span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">;</span> <span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">raw_json</span><span class="p">[</span><span class="n">index</span><span class="p">];</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;[&#39;</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;]&#39;</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;{&#39;</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;}&#39;</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;:&#39;</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;,&#39;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">token</span><span class="p">.</span><span class="n">value</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">c</span><span class="p">;</span> <span class="w"> </span><span class="n">index</span><span class="o">++</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span><span class="n">token</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">};</span> <span class="p">}</span> </pre></div> <h3 id="lex_string">lex_string</h3><p>This one manages state so it's a little more complex. We need to check if the current character is a double quote, then iterate over characters until we find the ending quote.</p> <p>It's possible to hit EOF here so we need to handle that case. And handling nested quotes is left as an exercise for the reader. :)</p> <div class="highlight"><pre><span></span><span class="n">std</span><span class="o">::</span><span class="n">tuple</span><span class="o">&lt;</span><span class="n">JSONToken</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="o">&gt;</span><span class="w"> </span><span class="n">lex_string</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">raw_json</span><span class="p">,</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">original_index</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">original_index</span><span class="p">;</span> <span class="w"> </span><span class="n">JSONToken</span><span class="w"> </span><span class="n">token</span><span class="p">{</span><span class="s">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">JSONTokenType</span><span class="o">::</span><span class="n">String</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">};</span> <span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">;</span> <span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">raw_json</span><span class="p">[</span><span class="n">index</span><span class="p">];</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">c</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="sc">&#39;&quot;&#39;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span><span class="n">token</span><span class="p">,</span><span class="w"> </span><span class="n">original_index</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">index</span><span class="o">++</span><span class="p">;</span> <span class="w"> </span><span class="c1">// TODO: handle nested quotes</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">raw_json</span><span class="p">[</span><span class="n">index</span><span class="p">],</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="sc">&#39;&quot;&#39;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">index</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">raw_json</span><span class="p">.</span><span class="n">length</span><span class="p">())</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span><span class="n">token</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">,</span><span class="w"> </span><span class="n">format_error</span><span class="p">(</span><span class="s">&quot;Unexpected EOF while lexing string&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">raw_json</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">)};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">token</span><span class="p">.</span><span class="n">value</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">c</span><span class="p">;</span> <span class="w"> </span><span class="n">index</span><span class="o">++</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">index</span><span class="o">++</span><span class="p">;</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span><span class="n">token</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">};</span> <span class="p">}</span> </pre></div> <p>Nothing too special to discuss here. So on to lexing numbers.</p> <h3 id="lex_number">lex_number</h3><p>Since we're only supporting integers, this one has no internal state. We check characters until we stop seeing digits.</p> <div class="highlight"><pre><span></span><span class="n">std</span><span class="o">::</span><span class="n">tuple</span><span class="o">&lt;</span><span class="n">JSONToken</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="o">&gt;</span><span class="w"> </span><span class="n">lex_number</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">raw_json</span><span class="p">,</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">original_index</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">original_index</span><span class="p">;</span> <span class="w"> </span><span class="n">JSONToken</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="s">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">JSONTokenType</span><span class="o">::</span><span class="n">Number</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">};</span> <span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">;</span> <span class="w"> </span><span class="c1">// TODO: handle not just integers</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="nb">true</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">index</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">raw_json</span><span class="p">.</span><span class="n">length</span><span class="p">())</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">raw_json</span><span class="p">[</span><span class="n">index</span><span class="p">];</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="p">(</span><span class="n">c</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="sc">&#39;0&#39;</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="n">c</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="sc">&#39;9&#39;</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">token</span><span class="p">.</span><span class="n">value</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">c</span><span class="p">;</span> <span class="w"> </span><span class="n">index</span><span class="o">++</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span><span class="n">token</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">};</span> <span class="p">}</span> </pre></div> <p>Done. On to keywords: <code>null</code>, <code>false</code>, <code>true</code>.</p> <h3 id="lex_keyword">lex_keyword</h3><p>This is a helper function that will check for a literal keyword.</p> <div class="highlight"><pre><span></span><span class="n">std</span><span class="o">::</span><span class="n">tuple</span><span class="o">&lt;</span><span class="n">JSONToken</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="o">&gt;</span><span class="w"> </span><span class="n">lex_keyword</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">raw_json</span><span class="p">,</span> <span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">keyword</span><span class="p">,</span> <span class="w"> </span><span class="n">JSONTokenType</span><span class="w"> </span><span class="n">type</span><span class="p">,</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">original_index</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">original_index</span><span class="p">;</span> <span class="w"> </span><span class="n">JSONToken</span><span class="w"> </span><span class="n">token</span><span class="p">{</span><span class="s">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">type</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">};</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">keyword</span><span class="p">[</span><span class="n">index</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">original_index</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">raw_json</span><span class="p">[</span><span class="n">index</span><span class="p">])</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">index</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">raw_json</span><span class="p">.</span><span class="n">length</span><span class="p">())</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">index</span><span class="o">++</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">index</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">original_index</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">keyword</span><span class="p">.</span><span class="n">length</span><span class="p">())</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">token</span><span class="p">.</span><span class="n">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">keyword</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span><span class="n">token</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">};</span> <span class="p">}</span> </pre></div> <p>With this defined we can now implement <code>lex_false</code>, <code>lex_true</code>, and <code>lex_null</code>.</p> <div class="highlight"><pre><span></span><span class="n">std</span><span class="o">::</span><span class="n">tuple</span><span class="o">&lt;</span><span class="n">JSONToken</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="o">&gt;</span><span class="w"> </span><span class="n">lex_null</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">raw_json</span><span class="p">,</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">index</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">lex_keyword</span><span class="p">(</span><span class="n">raw_json</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;null&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">JSONTokenType</span><span class="o">::</span><span class="n">Null</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">);</span> <span class="p">}</span> <span class="n">std</span><span class="o">::</span><span class="n">tuple</span><span class="o">&lt;</span><span class="n">JSONToken</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="o">&gt;</span><span class="w"> </span><span class="n">lex_true</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">raw_json</span><span class="p">,</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">index</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">lex_keyword</span><span class="p">(</span><span class="n">raw_json</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;true&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">JSONTokenType</span><span class="o">::</span><span class="n">Boolean</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">);</span> <span class="p">}</span> <span class="n">std</span><span class="o">::</span><span class="n">tuple</span><span class="o">&lt;</span><span class="n">JSONToken</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="o">&gt;</span><span class="w"> </span><span class="n">lex_false</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">raw_json</span><span class="p">,</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">index</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">lex_keyword</span><span class="p">(</span><span class="n">raw_json</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;false&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">JSONTokenType</span><span class="o">::</span><span class="n">Boolean</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">);</span> <span class="p">}</span> </pre></div> <p>And that's it for lexing! And although we defined all of these top-down, you'll want to write them mostly in reverse order or put in forward declarations.</p> <p>If you wanted to you could now write a simple <code>main.cpp</code> like:</p> <div class="highlight"><pre><span></span><span class="cp">#include</span><span class="w"> </span><span class="cpf">&quot;json.hpp&quot;</span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;iostream&gt;</span> <span class="kt">int</span><span class="w"> </span><span class="nf">main</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">argc</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">argv</span><span class="p">[])</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">argc</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">cerr</span><span class="w"> </span><span class="o">&lt;&lt;</span><span class="w"> </span><span class="s">&quot;Expected JSON input argument to parse&quot;</span><span class="w"> </span><span class="o">&lt;&lt;</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">in</span><span class="p">{</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">]};</span> <span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="p">[</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">error</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">json</span><span class="o">::</span><span class="n">lex</span><span class="p">(</span><span class="n">in</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">error</span><span class="p">.</span><span class="n">size</span><span class="p">())</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">cerr</span><span class="w"> </span><span class="o">&lt;&lt;</span><span class="w"> </span><span class="n">error</span><span class="w"> </span><span class="o">&lt;&lt;</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="k">auto</span><span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="n">tokens</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">cout</span><span class="w"> </span><span class="o">&lt;&lt;</span><span class="w"> </span><span class="n">t</span><span class="p">.</span><span class="n">value</span><span class="w"> </span><span class="o">&lt;&lt;</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>Set up a Makefile:</p> <div class="highlight"><pre><span></span><span class="nf">main</span><span class="o">:</span><span class="w"> </span>*.<span class="n">cpp</span> ./<span class="n">include</span>/*.<span class="n">hpp</span> <span class="w"> </span>clang++<span class="w"> </span>-g<span class="w"> </span>-Wall<span class="w"> </span>-std<span class="o">=</span>c++2a<span class="w"> </span>-I./include<span class="w"> </span>*.cpp<span class="w"> </span>-o<span class="w"> </span><span class="nv">$@</span> </pre></div> <p>Build with <code>make</code> and run <code>./main '{"a": 1}'</code> to see the list of tokens printed out.</p> <p>Now let's move on to parsing from the array of tokens.</p> <h3 id="parsing">Parsing</h3><p>This process takes the array of tokens and turns them into a tree structure. The tree develops children as we spot <code>[</code> or <code>{</code> tokens. The tree child ends when we spot <code>]</code> or <code>}</code> tokens.</p> <div class="highlight"><pre><span></span><span class="n">std</span><span class="o">::</span><span class="n">tuple</span><span class="o">&lt;</span><span class="n">JSONValue</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="o">&gt;</span><span class="w"> </span><span class="n">parse</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o">&lt;</span><span class="n">JSONToken</span><span class="o">&gt;</span><span class="w"> </span><span class="n">tokens</span><span class="p">,</span> <span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">index</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">index</span><span class="p">];</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">token</span><span class="p">.</span><span class="n">type</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">JSONTokenType</span><span class="o">::</span><span class="no">Number</span><span class="p">:</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">stod</span><span class="p">(</span><span class="n">token</span><span class="p">.</span><span class="n">value</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span><span class="n">JSONValue</span><span class="p">{.</span><span class="n">number</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">n</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">JSONValueType</span><span class="o">::</span><span class="n">Number</span><span class="p">},</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">JSONTokenType</span><span class="o">::</span><span class="no">Boolean</span><span class="p">:</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span><span class="n">JSONValue</span><span class="p">{.</span><span class="n">boolean</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">token</span><span class="p">.</span><span class="n">value</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;true&quot;</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">JSONValueType</span><span class="o">::</span><span class="n">Boolean</span><span class="p">},</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">};</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">JSONTokenType</span><span class="o">::</span><span class="no">Null</span><span class="p">:</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span><span class="n">JSONValue</span><span class="p">{.</span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">JSONValueType</span><span class="o">::</span><span class="n">Null</span><span class="p">},</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">};</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">JSONTokenType</span><span class="o">::</span><span class="no">String</span><span class="p">:</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span><span class="n">JSONValue</span><span class="p">{.</span><span class="n">string</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">token</span><span class="p">.</span><span class="n">value</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">JSONValueType</span><span class="o">::</span><span class="n">String</span><span class="p">},</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">};</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">JSONTokenType</span><span class="o">::</span><span class="no">Syntax</span><span class="p">:</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">token</span><span class="p">.</span><span class="n">value</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;[&quot;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="p">[</span><span class="n">array</span><span class="p">,</span><span class="w"> </span><span class="n">new_index</span><span class="p">,</span><span class="w"> </span><span class="n">error</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">parse_array</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span><span class="n">JSONValue</span><span class="p">{.</span><span class="n">array</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">array</span><span class="p">,</span><span class="w"> </span><span class="p">.</span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">JSONValueType</span><span class="o">::</span><span class="n">Array</span><span class="p">},</span><span class="w"> </span><span class="n">new_index</span><span class="p">,</span><span class="w"> </span><span class="n">error</span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">token</span><span class="p">.</span><span class="n">value</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;{&quot;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="p">[</span><span class="n">object</span><span class="p">,</span><span class="w"> </span><span class="n">new_index</span><span class="p">,</span><span class="w"> </span><span class="n">error</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">parse_object</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span><span class="n">JSONValue</span><span class="p">{.</span><span class="n">object</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">optional</span><span class="p">(</span><span class="n">object</span><span class="p">),</span><span class="w"> </span><span class="p">.</span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">JSONValueType</span><span class="o">::</span><span class="n">Object</span><span class="p">},</span><span class="w"> </span><span class="n">new_index</span><span class="p">,</span><span class="w"> </span><span class="n">error</span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{{},</span><span class="w"> </span><span class="n">index</span><span class="p">,</span><span class="w"> </span><span class="n">format_parse_error</span><span class="p">(</span><span class="s">&quot;Failed to parse&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">token</span><span class="p">)};</span> <span class="p">}</span> </pre></div> <p>This in turn reference <code>format_parse_error</code> on failure which is an error-string-maker similar to <code>format_error</code>. It actually calls <code>format_error</code> with more details specific to parsing.</p> <div class="highlight"><pre><span></span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="nf">JSONTokenType_to_string</span><span class="p">(</span><span class="n">JSONTokenType</span><span class="w"> </span><span class="n">jtt</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">jtt</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">JSONTokenType</span><span class="o">::</span><span class="no">String</span><span class="p">:</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">&quot;String&quot;</span><span class="p">;</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">JSONTokenType</span><span class="o">::</span><span class="no">Number</span><span class="p">:</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">&quot;Number&quot;</span><span class="p">;</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">JSONTokenType</span><span class="o">::</span><span class="no">Syntax</span><span class="p">:</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">&quot;Syntax&quot;</span><span class="p">;</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">JSONTokenType</span><span class="o">::</span><span class="no">Boolean</span><span class="p">:</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">&quot;Boolean&quot;</span><span class="p">;</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">JSONTokenType</span><span class="o">::</span><span class="no">Null</span><span class="p">:</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">&quot;Null&quot;</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> <span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="nf">format_parse_error</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">base</span><span class="p">,</span><span class="w"> </span><span class="n">JSONToken</span><span class="w"> </span><span class="n">token</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">ostringstream</span><span class="w"> </span><span class="n">s</span><span class="p">;</span> <span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="o">&lt;&lt;</span><span class="w"> </span><span class="s">&quot;Unexpected token &#39;&quot;</span><span class="w"> </span><span class="o">&lt;&lt;</span><span class="w"> </span><span class="n">token</span><span class="p">.</span><span class="n">value</span><span class="w"> </span><span class="o">&lt;&lt;</span><span class="w"> </span><span class="s">&quot;&#39;, type &#39;&quot;</span> <span class="w"> </span><span class="o">&lt;&lt;</span><span class="w"> </span><span class="n">JSONTokenType_to_string</span><span class="p">(</span><span class="n">token</span><span class="p">.</span><span class="n">type</span><span class="p">)</span><span class="w"> </span><span class="o">&lt;&lt;</span><span class="w"> </span><span class="s">&quot;&#39;, index &quot;</span><span class="p">;</span> <span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="o">&lt;&lt;</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="w"> </span><span class="o">&lt;&lt;</span><span class="w"> </span><span class="n">base</span><span class="p">;</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">format_error</span><span class="p">(</span><span class="n">s</span><span class="p">.</span><span class="n">str</span><span class="p">(),</span><span class="w"> </span><span class="o">*</span><span class="n">token</span><span class="p">.</span><span class="n">full_source</span><span class="p">,</span><span class="w"> </span><span class="n">token</span><span class="p">.</span><span class="n">location</span><span class="p">);</span> <span class="p">}</span> </pre></div> <p class="note"> This function depended on a helper for turning the <code>JSONTokenType</code> enum into a string. As a user it's very annoying when langauges doesn't give you stringifier methods for enums by default for debugging. I know there's some ways to do this with reflection in C++ but it seemed hairy. But I digest. </p><h4 id="parse_array">parse_array</h4><p>This function was called by <code>parse</code> when we found an opening bracket. This function needs to recursively call parse and then check for a comma and call parse again ... until it finds the closing bracket.</p> <p>It will fail if it every finds something other than a comma or closing bracket following a succesful call to <code>parse</code>.</p> <div class="highlight"><pre><span></span><span class="n">std</span><span class="o">::</span><span class="n">tuple</span><span class="o">&lt;</span><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o">&lt;</span><span class="n">JSONValue</span><span class="o">&gt;</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="o">&gt;</span> <span class="n">parse_array</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o">&lt;</span><span class="n">JSONToken</span><span class="o">&gt;</span><span class="w"> </span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">index</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o">&lt;</span><span class="n">JSONValue</span><span class="o">&gt;</span><span class="w"> </span><span class="n">children</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{};</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">index</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">tokens</span><span class="p">.</span><span class="n">size</span><span class="p">())</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">index</span><span class="p">];</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">t</span><span class="p">.</span><span class="n">type</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">JSONTokenType</span><span class="o">::</span><span class="n">Syntax</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">t</span><span class="p">.</span><span class="n">value</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;]&quot;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span><span class="n">children</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">t</span><span class="p">.</span><span class="n">value</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;,&quot;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">index</span><span class="o">++</span><span class="p">;</span> <span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">index</span><span class="p">];</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">children</span><span class="p">.</span><span class="n">size</span><span class="p">()</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{{},</span> <span class="w"> </span><span class="n">index</span><span class="p">,</span> <span class="w"> </span><span class="n">format_parse_error</span><span class="p">(</span><span class="s">&quot;Expected comma after element in array&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">t</span><span class="p">)};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="p">[</span><span class="n">child</span><span class="p">,</span><span class="w"> </span><span class="n">new_index</span><span class="p">,</span><span class="w"> </span><span class="n">error</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">parse</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">error</span><span class="p">.</span><span class="n">size</span><span class="p">())</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{{},</span><span class="w"> </span><span class="n">index</span><span class="p">,</span><span class="w"> </span><span class="n">error</span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">children</span><span class="p">.</span><span class="n">push_back</span><span class="p">(</span><span class="n">child</span><span class="p">);</span> <span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">new_index</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">{},</span> <span class="w"> </span><span class="n">index</span><span class="p">,</span> <span class="w"> </span><span class="n">format_parse_error</span><span class="p">(</span><span class="s">&quot;Unexpected EOF while parsing array&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">index</span><span class="p">])};</span> <span class="p">}</span> </pre></div> <p>And finally we need to implement <code>parse_object</code>.</p> <h4 id="parse_object">parse_object</h4><p>This function is similar to <code>parse_array</code> but it needs to find <code>$string COLON $parse() COMMA</code> pattern pairs.</p> <div class="highlight"><pre><span></span><span class="n">std</span><span class="o">::</span><span class="n">tuple</span><span class="o">&lt;</span><span class="n">std</span><span class="o">::</span><span class="n">map</span><span class="o">&lt;</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="p">,</span><span class="w"> </span><span class="n">JSONValue</span><span class="o">&gt;</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="o">&gt;</span> <span class="n">parse_object</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o">&lt;</span><span class="n">JSONToken</span><span class="o">&gt;</span><span class="w"> </span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">index</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">map</span><span class="o">&lt;</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="p">,</span><span class="w"> </span><span class="n">JSONValue</span><span class="o">&gt;</span><span class="w"> </span><span class="n">values</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{};</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">index</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">tokens</span><span class="p">.</span><span class="n">size</span><span class="p">())</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">index</span><span class="p">];</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">t</span><span class="p">.</span><span class="n">type</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">JSONTokenType</span><span class="o">::</span><span class="n">Syntax</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">t</span><span class="p">.</span><span class="n">value</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;}&quot;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span><span class="n">values</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">t</span><span class="p">.</span><span class="n">value</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;,&quot;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">index</span><span class="o">++</span><span class="p">;</span> <span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">index</span><span class="p">];</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">values</span><span class="p">.</span><span class="n">size</span><span class="p">()</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">{},</span> <span class="w"> </span><span class="n">index</span><span class="p">,</span> <span class="w"> </span><span class="n">format_parse_error</span><span class="p">(</span><span class="s">&quot;Expected comma after element in object&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">t</span><span class="p">)};</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{{},</span> <span class="w"> </span><span class="n">index</span><span class="p">,</span> <span class="w"> </span><span class="n">format_parse_error</span><span class="p">(</span> <span class="w"> </span><span class="s">&quot;Expected key-value pair or closing brace in object&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">t</span><span class="p">)};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="p">[</span><span class="n">key</span><span class="p">,</span><span class="w"> </span><span class="n">new_index</span><span class="p">,</span><span class="w"> </span><span class="n">error</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">parse</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">error</span><span class="p">.</span><span class="n">size</span><span class="p">())</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{{},</span><span class="w"> </span><span class="n">index</span><span class="p">,</span><span class="w"> </span><span class="n">error</span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">key</span><span class="p">.</span><span class="n">type</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="n">JSONValueType</span><span class="o">::</span><span class="n">String</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">{},</span><span class="w"> </span><span class="n">index</span><span class="p">,</span><span class="w"> </span><span class="n">format_parse_error</span><span class="p">(</span><span class="s">&quot;Expected string key in object&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">t</span><span class="p">)};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">new_index</span><span class="p">;</span> <span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">index</span><span class="p">];</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="p">(</span><span class="n">t</span><span class="p">.</span><span class="n">type</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">JSONTokenType</span><span class="o">::</span><span class="n">Syntax</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="n">t</span><span class="p">.</span><span class="n">value</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;:&quot;</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{{},</span> <span class="w"> </span><span class="n">index</span><span class="p">,</span> <span class="w"> </span><span class="n">format_parse_error</span><span class="p">(</span><span class="s">&quot;Expected colon after key in object&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">t</span><span class="p">)};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">index</span><span class="o">++</span><span class="p">;</span> <span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tokens</span><span class="p">[</span><span class="n">index</span><span class="p">];</span> <span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="p">[</span><span class="n">value</span><span class="p">,</span><span class="w"> </span><span class="n">new_index1</span><span class="p">,</span><span class="w"> </span><span class="n">error1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">parse</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">error1</span><span class="p">.</span><span class="n">size</span><span class="p">())</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{{},</span><span class="w"> </span><span class="n">index</span><span class="p">,</span><span class="w"> </span><span class="n">error1</span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">values</span><span class="p">[</span><span class="n">key</span><span class="p">.</span><span class="n">string</span><span class="p">.</span><span class="n">value</span><span class="p">()]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">value</span><span class="p">;</span> <span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">new_index1</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span><span class="n">values</span><span class="p">,</span><span class="w"> </span><span class="n">index</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">};</span> <span class="p">}</span> </pre></div> <p>These parse functions are all slightly tedious but still very simple. And thankfully, we're done!</p> <p>We can now implement the variation of <code>parse</code> that ties together lexing and parsing.</p> <div class="highlight"><pre><span></span><span class="n">std</span><span class="o">::</span><span class="n">tuple</span><span class="o">&lt;</span><span class="n">JSONValue</span><span class="p">,</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="o">&gt;</span><span class="w"> </span><span class="n">parse</span><span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">source</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="p">[</span><span class="n">tokens</span><span class="p">,</span><span class="w"> </span><span class="n">error</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">json</span><span class="o">::</span><span class="n">lex</span><span class="p">(</span><span class="n">source</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">error</span><span class="p">.</span><span class="n">size</span><span class="p">())</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{{},</span><span class="w"> </span><span class="n">error</span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="p">[</span><span class="n">ast</span><span class="p">,</span><span class="w"> </span><span class="n">_</span><span class="p">,</span><span class="w"> </span><span class="n">error1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">json</span><span class="o">::</span><span class="n">parse</span><span class="p">(</span><span class="n">tokens</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span><span class="n">ast</span><span class="p">,</span><span class="w"> </span><span class="n">error1</span><span class="p">};</span> <span class="p">}</span> </pre></div> <p>And we're completely done the string to <code>JSONValue</code> code.</p> <h3 id="deparse">deparse</h3><p>The very last piece of the implementation is to do the reverse of the past operations: generate a string from a <code>JSONValue</code>.</p> <p>This is a recursive function and the only mildly tricky part is deciding how to do whitespace if we want a prettier output.</p> <div class="highlight"><pre><span></span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="nf">deparse</span><span class="p">(</span><span class="n">JSONValue</span><span class="w"> </span><span class="n">v</span><span class="p">,</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">whitespace</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">v</span><span class="p">.</span><span class="n">type</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">JSONValueType</span><span class="o">::</span><span class="no">String</span><span class="p">:</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">&quot;</span><span class="se">\&quot;</span><span class="s">&quot;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">v</span><span class="p">.</span><span class="n">string</span><span class="p">.</span><span class="n">value</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s">&quot;</span><span class="se">\&quot;</span><span class="s">&quot;</span><span class="p">;</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">JSONValueType</span><span class="o">::</span><span class="no">Boolean</span><span class="p">:</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">(</span><span class="n">v</span><span class="p">.</span><span class="n">boolean</span><span class="p">.</span><span class="n">value</span><span class="p">()</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="s">&quot;true&quot;</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s">&quot;false&quot;</span><span class="p">);</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">JSONValueType</span><span class="o">::</span><span class="no">Number</span><span class="p">:</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">to_string</span><span class="p">(</span><span class="n">v</span><span class="p">.</span><span class="n">number</span><span class="p">.</span><span class="n">value</span><span class="p">());</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">JSONValueType</span><span class="o">::</span><span class="no">Null</span><span class="p">:</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s">&quot;null&quot;</span><span class="p">;</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">JSONValueType</span><span class="o">::</span><span class="no">Array</span><span class="p">:</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;[</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">;</span> <span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">v</span><span class="p">.</span><span class="n">array</span><span class="p">.</span><span class="n">value</span><span class="p">();</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">a</span><span class="p">.</span><span class="n">size</span><span class="p">();</span><span class="w"> </span><span class="n">i</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">a</span><span class="p">[</span><span class="n">i</span><span class="p">];</span> <span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">whitespace</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s">&quot; &quot;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">deparse</span><span class="p">(</span><span class="n">value</span><span class="p">,</span><span class="w"> </span><span class="n">whitespace</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s">&quot; &quot;</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">a</span><span class="p">.</span><span class="n">size</span><span class="p">()</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="s">&quot;,&quot;</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="s">&quot;</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">whitespace</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s">&quot;]&quot;</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">JSONValueType</span><span class="o">::</span><span class="no">Object</span><span class="p">:</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;{</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">;</span> <span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">values</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">v</span><span class="p">.</span><span class="n">object</span><span class="p">.</span><span class="n">value</span><span class="p">();</span> <span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="k">auto</span><span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="o">&amp;</span><span class="p">[</span><span class="n">key</span><span class="p">,</span><span class="w"> </span><span class="n">value</span><span class="p">]</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="n">values</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="n">whitespace</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s">&quot; &quot;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s">&quot;</span><span class="se">\&quot;</span><span class="s">&quot;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">key</span><span class="w"> </span><span class="o">+</span> <span class="w"> </span><span class="s">&quot;</span><span class="se">\&quot;</span><span class="s">: &quot;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">deparse</span><span class="p">(</span><span class="n">value</span><span class="p">,</span><span class="w"> </span><span class="n">whitespace</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s">&quot; &quot;</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">values</span><span class="p">.</span><span class="n">size</span><span class="p">()</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="s">&quot;,&quot;</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="s">&quot;</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">;</span> <span class="w"> </span><span class="n">i</span><span class="o">++</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">whitespace</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s">&quot;}&quot;</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>Done. Done. Done.</p> <h3 id="main.cpp">main.cpp</h3><p>This program will simply accept a JSON input, parse it, and pretty print it right back out. Kind of like a simplified <code>jq</code>.</p> <div class="highlight"><pre><span></span><span class="cp">#include</span><span class="w"> </span><span class="cpf">&quot;json.hpp&quot;</span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;iostream&gt;</span> <span class="kt">int</span><span class="w"> </span><span class="nf">main</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">argc</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">argv</span><span class="p">[])</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">argc</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">cerr</span><span class="w"> </span><span class="o">&lt;&lt;</span><span class="w"> </span><span class="s">&quot;Expected JSON input argument to parse&quot;</span><span class="w"> </span><span class="o">&lt;&lt;</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">in</span><span class="p">{</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">]};</span> <span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="p">[</span><span class="n">ast</span><span class="p">,</span><span class="w"> </span><span class="n">error</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">json</span><span class="o">::</span><span class="n">parse</span><span class="p">(</span><span class="n">in</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">error</span><span class="p">.</span><span class="n">size</span><span class="p">())</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">cerr</span><span class="w"> </span><span class="o">&lt;&lt;</span><span class="w"> </span><span class="n">error</span><span class="w"> </span><span class="o">&lt;&lt;</span><span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">1</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">cout</span><span class="w"> </span><span class="o">&lt;&lt;</span><span class="w"> </span><span class="n">json</span><span class="o">::</span><span class="n">deparse</span><span class="p">(</span><span class="n">ast</span><span class="p">);</span> <span class="p">}</span> </pre></div> <p>Build it with <code>make</code> that we already defined, and run it against something big like <a href="https://github.com/eatonphil/cpp-json/blob/main/test/glossary.json">this</a>.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>cpp-json $<span class="w"> </span>make $<span class="w"> </span>./main<span class="w"> </span><span class="s2">&quot;</span><span class="k">$(</span>cat<span class="w"> </span>./test/glossary.json<span class="k">)</span><span class="s2">&quot;</span> <span class="o">{</span> <span class="w"> </span><span class="s2">&quot;glossary&quot;</span>:<span class="w"> </span><span class="o">{</span> <span class="w"> </span><span class="s2">&quot;GlossDiv&quot;</span>:<span class="w"> </span><span class="o">{</span> <span class="w"> </span><span class="s2">&quot;GlossList&quot;</span>:<span class="w"> </span><span class="o">{</span> <span class="w"> </span><span class="s2">&quot;GlossEntry&quot;</span>:<span class="w"> </span><span class="o">{</span> <span class="w"> </span><span class="s2">&quot;Abbrev&quot;</span>:<span class="w"> </span><span class="s2">&quot;ISO 8879:1986&quot;</span>, <span class="w"> </span><span class="s2">&quot;Acronym&quot;</span>:<span class="w"> </span><span class="s2">&quot;SGML&quot;</span>, <span class="w"> </span><span class="s2">&quot;GlossDef&quot;</span>:<span class="w"> </span><span class="o">{</span> <span class="w"> </span><span class="s2">&quot;GlossSeeAlso&quot;</span>:<span class="w"> </span><span class="o">[</span> <span class="w"> </span><span class="s2">&quot;GML&quot;</span>, <span class="w"> </span><span class="s2">&quot;XML&quot;</span> <span class="w"> </span><span class="o">]</span>, <span class="w"> </span><span class="s2">&quot;para&quot;</span>:<span class="w"> </span><span class="s2">&quot;A meta-markup language, used to create markup languages such as DocBook.&quot;</span> <span class="w"> </span><span class="o">}</span>, <span class="w"> </span><span class="s2">&quot;GlossSee&quot;</span>:<span class="w"> </span><span class="s2">&quot;markup&quot;</span>, <span class="w"> </span><span class="s2">&quot;GlossTerm&quot;</span>:<span class="w"> </span><span class="s2">&quot;Standard Generalized Markup Language&quot;</span>, <span class="w"> </span><span class="s2">&quot;ID&quot;</span>:<span class="w"> </span><span class="s2">&quot;SGML&quot;</span>, <span class="w"> </span><span class="s2">&quot;SortAs&quot;</span>:<span class="w"> </span><span class="s2">&quot;SGML&quot;</span> <span class="w"> </span><span class="o">}</span> <span class="w"> </span><span class="o">}</span>, <span class="w"> </span><span class="s2">&quot;title&quot;</span>:<span class="w"> </span><span class="s2">&quot;S&quot;</span> <span class="w"> </span><span class="o">}</span>, <span class="w"> </span><span class="s2">&quot;title&quot;</span>:<span class="w"> </span><span class="s2">&quot;example glossary&quot;</span> <span class="w"> </span><span class="o">}</span> <span class="o">}</span> </pre></div> <p>Or something incorrect like:</p> <div class="highlight"><pre><span></span>./main<span class="w"> </span><span class="s1">&#39;{&quot;foo&quot;: [{ 1: 2 }]}&#39;</span> Unexpected<span class="w"> </span>token<span class="w"> </span><span class="s1">&#39;1&#39;</span>,<span class="w"> </span><span class="nb">type</span><span class="w"> </span><span class="s1">&#39;Number&#39;</span>,<span class="w"> </span>index Expected<span class="w"> </span>string<span class="w"> </span>key<span class="w"> </span><span class="k">in</span><span class="w"> </span>object<span class="w"> </span>at<span class="w"> </span>line<span class="w"> </span><span class="m">1</span>,<span class="w"> </span>column<span class="w"> </span><span class="m">11</span> <span class="o">{</span><span class="s2">&quot;foo&quot;</span>:<span class="w"> </span><span class="o">[{</span><span class="w"> </span><span class="m">1</span>:<span class="w"> </span><span class="m">2</span><span class="w"> </span><span class="o">}]}</span> <span class="w"> </span>^ </pre></div> <p>And give Valgrind the old try:</p> <div class="highlight"><pre><span></span>valgrind<span class="w"> </span>./main<span class="w"> </span><span class="s1">&#39;{&quot;a&quot;: [1, 2, null, { &quot;c&quot;: 129 }]}&#39;</span> <span class="o">==</span><span class="nv">153027</span><span class="o">==</span><span class="w"> </span>Memcheck,<span class="w"> </span>a<span class="w"> </span>memory<span class="w"> </span>error<span class="w"> </span><span class="nv">detector</span> <span class="o">==</span><span class="nv">153027</span><span class="o">==</span><span class="w"> </span>Copyright<span class="w"> </span><span class="o">(</span>C<span class="o">)</span><span class="w"> </span><span class="m">2002</span>-2017,<span class="w"> </span>and<span class="w"> </span>GNU<span class="w"> </span>GPL<span class="err">&#39;</span>d,<span class="w"> </span>by<span class="w"> </span>Julian<span class="w"> </span>Seward<span class="w"> </span>et<span class="w"> </span>al. <span class="o">==</span><span class="nv">153027</span><span class="o">==</span><span class="w"> </span>Using<span class="w"> </span>Valgrind-3.17.0<span class="w"> </span>and<span class="w"> </span>LibVEX<span class="p">;</span><span class="w"> </span>rerun<span class="w"> </span>with<span class="w"> </span>-h<span class="w"> </span><span class="k">for</span><span class="w"> </span>copyright<span class="w"> </span><span class="nv">info</span> <span class="o">==</span><span class="nv">153027</span><span class="o">==</span><span class="w"> </span>Command:<span class="w"> </span>./main<span class="w"> </span><span class="o">{</span><span class="s2">&quot;a&quot;</span>:<span class="se">\ </span><span class="o">[</span><span class="m">1</span>,<span class="se">\ </span><span class="m">2</span>,<span class="se">\ </span>null,<span class="se">\ </span><span class="o">{</span><span class="se">\ </span><span class="s2">&quot;c&quot;</span>:<span class="se">\ </span><span class="m">129</span><span class="se">\ </span><span class="o">}]}</span> <span class="o">==</span><span class="nv">153027</span><span class="o">==</span> <span class="o">{</span> <span class="w"> </span><span class="s2">&quot;a&quot;</span>:<span class="w"> </span><span class="o">[</span> <span class="w"> </span><span class="m">1</span>.000000, <span class="w"> </span><span class="m">2</span>.000000, <span class="w"> </span>null, <span class="w"> </span><span class="o">{</span> <span class="w"> </span><span class="s2">&quot;c&quot;</span>:<span class="w"> </span><span class="m">129</span>.000000 <span class="w"> </span><span class="o">}</span> <span class="w"> </span><span class="o">]</span> <span class="o">}==</span><span class="nv">153027</span><span class="o">==</span> <span class="o">==</span><span class="nv">153027</span><span class="o">==</span><span class="w"> </span>HEAP<span class="w"> </span>SUMMARY: <span class="o">==</span><span class="nv">153027</span><span class="o">==</span><span class="w"> </span><span class="k">in</span><span class="w"> </span>use<span class="w"> </span>at<span class="w"> </span>exit:<span class="w"> </span><span class="m">0</span><span class="w"> </span>bytes<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="nv">blocks</span> <span class="o">==</span><span class="nv">153027</span><span class="o">==</span><span class="w"> </span>total<span class="w"> </span>heap<span class="w"> </span>usage:<span class="w"> </span><span class="m">128</span><span class="w"> </span>allocs,<span class="w"> </span><span class="m">128</span><span class="w"> </span>frees,<span class="w"> </span><span class="m">105</span>,386<span class="w"> </span>bytes<span class="w"> </span><span class="nv">allocated</span> <span class="o">==</span><span class="nv">153027</span><span class="o">==</span> <span class="o">==</span><span class="nv">153027</span><span class="o">==</span><span class="w"> </span>All<span class="w"> </span>heap<span class="w"> </span>blocks<span class="w"> </span>were<span class="w"> </span>freed<span class="w"> </span>--<span class="w"> </span>no<span class="w"> </span>leaks<span class="w"> </span>are<span class="w"> </span><span class="nv">possible</span> <span class="o">==</span><span class="nv">153027</span><span class="o">==</span> <span class="o">==</span><span class="nv">153027</span><span class="o">==</span><span class="w"> </span>For<span class="w"> </span>lists<span class="w"> </span>of<span class="w"> </span>detected<span class="w"> </span>and<span class="w"> </span>suppressed<span class="w"> </span>errors,<span class="w"> </span>rerun<span class="w"> </span>with:<span class="w"> </span>-s <span class="o">==</span><span class="nv">153027</span><span class="o">==</span><span class="w"> </span>ERROR<span class="w"> </span>SUMMARY:<span class="w"> </span><span class="m">0</span><span class="w"> </span>errors<span class="w"> </span>from<span class="w"> </span><span class="m">0</span><span class="w"> </span>contexts<span class="w"> </span><span class="o">(</span>suppressed:<span class="w"> </span><span class="m">0</span><span class="w"> </span>from<span class="w"> </span><span class="m">0</span><span class="o">)</span> </pre></div> <p>Pretty sweet. Modern C++, I like it!</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I don&#39;t do a lot of C++ so I wanted to get a sense for what it can look like today.<br><br>This post walks through a number of new-ish C++ features as we build a handwritten recursive descent parser for JSON using only the standard library.<a href="https://t.co/cCN6nP0pDi">https://t.co/cCN6nP0pDi</a> <a href="https://t.co/0AZNEZv4Ss">pic.twitter.com/0AZNEZv4Ss</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1431000902710796292?ref_src=twsrc%5Etfw">August 26, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/writing-a-simple-json-library-in-modern-cpp.htmlThu, 26 Aug 2021 00:00:00 +0000Parser generators vs. handwritten parsers: surveying major language implementations in 2021http://notes.eatonphil.com/parser-generators-vs-handwritten-parsers-survey-2021.html<p>Developers often think parser generators are the sole legit way to build programming language frontends, possibly because compiler courses in university teach lex/yacc variants. But do any modern programming languages actually use parser generators anymore?</p> <p>To find out, this post presents a non-definitive survey of the parsing techniques used by various major programming language implementations.</p> <h3 id="cpython:-peg-parser">CPython: PEG parser</h3><p>Until CPython 3.10 (which hasn't been released yet) the default parser was built using <a href="https://www.python.org/dev/peps/pep-0269/">pgen</a>, a custom parser generator. The team thought the PEG parser was a <a href="https://www.python.org/dev/peps/pep-0617/">better fit for expressing the language</a>. At the time the switch from pgen to PEG parser improved speed 10% but increased memory usage by 10% as well.</p> <p>The PEG grammar is defined <a href="https://github.com/python/cpython/blob/v3.9.6/Grammar/python.gram">here</a>. (It is getting renamed in 3.10 though so check the directory for a file of a similar name if you browse 3.10+).</p> <p class="note"> This section was corrected by <a href="https://www.reddit.com/r/ProgrammingLanguages/comments/p8vvcs/parser_generators_vs_handwritten_parsers/h9tbuve/?utm_source=reddit&utm_medium=web2x&context=3">MegaIng</a> on Reddit. Originally I mistakenly claimed the previous parser was handwritten. It was not. <br /><br /> Thanks <a href="https://twitter.com/jryans">J. Ryan Stinnett</a> for a correction about the change in speed in the new PEG parser. </p><h3 id="gcc:-handwritten">GCC: Handwritten</h3><p>Source code for the C parser available <a href="https://github.com/gcc-mirror/gcc/blob/releases/gcc-12.1.0/gcc/c/c-parser.cc">here</a>. It used to use Bison until <a href="https://gcc.gnu.org/gcc-4.1/changes.html">GCC 4.1 in 2006</a>. The C++ parser also switched from Bison to a handwritten parser <a href="https://gcc.gnu.org/gcc-3.4/changes.html">2 years earlier</a>.</p> <h3 id="clang:-handwritten">Clang: Handwritten</h3><p>Not only handwritten but the same <em>file</em> handles parsing C, Objective-C and C++. Source code is available <a href="https://github.com/llvm/llvm-project/blob/llvmorg-12.0.1/clang/lib/Parse/Parser.cpp">here</a>.</p> <h3 id="ruby:-yacc-like-parser-generator">Ruby: Yacc-like Parser Generator</h3><p>Ruby uses Bison. The grammar for the language can be found <a href="https://github.com/ruby/ruby/blob/v3_0_2/parse.y">here</a>.</p> <h3 id="v8-javascript:-handwritten">V8 JavaScript: Handwritten</h3><p>Source code available <a href="https://github.com/v8/v8/blob/9.5.38/src/parsing/parser.cc">here</a>.</p> <h3 id="zend-engine-php:-yacc-like-parser-generator">Zend Engine PHP: Yacc-like Parser Generator</h3><p>Source code available <a href="https://github.com/php/php-src/blob/php-8.0.9/Zend/zend_language_parser.y">here</a>.</p> <h3 id="typescript:-handwritten">TypeScript: Handwritten</h3><p>Source code available <a href="https://github.com/microsoft/TypeScript/blob/v4.3.5/src/compiler/parser.ts">here</a>.</p> <h3 id="bash:-yacc-like-parser-generator">Bash: Yacc-like Parser Generator</h3><p>Source code for the grammar is available <a href="http://git.savannah.gnu.org/cgit/bash.git/tree/parse.y?h=bash-5.1">here</a>.</p> <h3 id="chromium-css-parser:-handwritten">Chromium CSS Parser: Handwritten</h3><p>Source code available <a href="https://github.com/chromium/chromium/blob/95.0.4617.2/third_party/blink/renderer/core/css/parser/css_parser_impl.cc">here</a>.</p> <h3 id="java-(openjdk):-handwritten">Java (OpenJDK): Handwritten</h3><p>You can find the source code <a href="https://github.com/openjdk/jdk/blob/jdk-18%2B11/src/jdk.compiler/share/classes/com/sun/tools/javac/parser/JavacParser.java">here</a>.</p> <p>Some <a href="https://openjdk.java.net/projects/compiler-grammar/">older commentary</a> calls this implementation fragile. But a Java contributor <a href="https://twitter.com/BrianGoetz/status/1429227723055042568">suggests the situation has improved since Java 8</a>.</p> <h3 id="golang:-handwritten">Golang: Handwritten</h3><p>Until Go 1.6 the compiler used a yacc-based parser. The source code for that grammar is available <a href="https://github.com/golang/go/blob/go1.5/src/cmd/compile/internal/gc/y.go">here</a>.</p> <p>In Go 1.6 they switched to a handwritten parser. You can find that change <a href="https://go-review.googlesource.com/c/go/+/16665/">here</a>. There was a reported 18% speed increase when parsing files and a reported 3% speed increase in building the compiler itself when switching.</p> <p>You can find the source code for the compiler's parser <a href="https://github.com/golang/go/blob/go1.17/src/cmd/compile/internal/syntax/parser.go">here</a>.</p> <h3 id="roslyn:-handwritten">Roslyn: Handwritten</h3><p>The C# parser source code is available <a href="https://github.com/dotnet/roslyn/blob/Visual-Studio-2019-Version-16.11/src/Compilers/CSharp/Portable/Parser/LanguageParser.cs">here</a>. The Visual Basic parser source code is <a href="https://github.com/dotnet/roslyn/blob/Visual-Studio-2019-Version-16.11/src/Compilers/VisualBasic/Portable/Parser/Parser.vb">here</a>.</p> <p>A C# contributor mentioned a few key reasons for using a handwritten parser <a href="https://news.ycombinator.com/item?id=13915150">here</a>.</p> <h3 id="lua:-handwritten">Lua: Handwritten</h3><p>Source code available <a href="https://github.com/lua/lua/blob/v5.4.3/lparser.c">here</a>.</p> <h3 id="swift:-handwritten">Swift: Handwritten</h3><p>Source code available <a href="https://github.com/apple/swift/blob/swift-5.4.2-RELEASE/lib/Parse/Parser.cpp">here</a>.</p> <h3 id="r:-yacc-like-parser-generator">R: Yacc-like Parser Generator</h3><p>I couldn't find it at first but <a href="https://www.reddit.com/r/programming/comments/p8vv1l/parser_generators_vs_handwritten_parsers/h9tl763/?utm_source=reddit&amp;utm_medium=web2x&amp;context=3">Liorithiel</a> showed me the parser source code is <a href="https://github.com/wch/r-source/blob/trunk/src/main/gram.y">here</a>.</p> <h3 id="julia:-handwritten-...-in-scheme">Julia: Handwritten ... in Scheme</h3><p>Julia's parser is handwritten but not in Julia. It's in Scheme! Source code available <a href="https://github.com/JuliaLang/julia/blob/v1.6.2/src/julia-parser.scm">here</a>.</p> <h3 id="postgresql:-yacc-like-parser-generator">PostgreSQL: Yacc-like Parser Generator</h3><p>PostgreSQL uses Bison for parsing queries. Source code for the grammar available <a href="https://github.com/postgres/postgres/blob/REL_13_STABLE/src/backend/parser/gram.y">here</a>.</p> <h3 id="mysql:-yacc-parser-generator">MySQL: Yacc Parser Generator</h3><p>Source code for the grammar available <a href="https://github.com/mysql/mysql-server/blob/8.0/sql/sql_yacc.yy">here</a>.</p> <h3 id="sqlite:-yacc-like-parser-generator">SQLite: Yacc-like Parser Generator</h3><p>SQLite uses its own parser generator called <a href="https://www.sqlite.org/lemon.html">Lemon</a>. Source code for the grammary is available <a href="https://github.com/sqlite/sqlite/blob/version-3.36.0/src/parse.y">here</a>.</p> <h3 id="summary">Summary</h3><p>Of the <a href="https://redmonk.com/sogrady/2021/03/01/language-rankings-1-21/">2021 Redmonk top 10 languages</a>, 8 of them have a handwritten parser. Ruby and Python use parser generators.</p> <p>Although parser generators are still used in major language implementations, maybe it's time for universities to start teaching handwritten parsing?</p> <p class="note"> This tweet was published before I was corrected about Python's parser. It should say 8/10 but I cannot edit the tweet. </p><p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Let&#39;s actually survey the parsing techniques used by major programming languages in 2021 (with links to code 👾).<br><br>In this post we discover that 9/10 of the top languages by <a href="https://twitter.com/redmonk?ref_src=twsrc%5Etfw">@redmonk</a> use a handwritten parser as opposed to a parser generator. 😱<a href="https://t.co/M69TqN78G5">https://t.co/M69TqN78G5</a> <a href="https://t.co/sGsdDmwshB">pic.twitter.com/sGsdDmwshB</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1429137493019045899?ref_src=twsrc%5Etfw">August 21, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/parser-generators-vs-handwritten-parsers-survey-2021.htmlSat, 21 Aug 2021 00:00:00 +0000Practical? Common Lisp on the JVM: A quick intro to ABCL for modern web appshttp://notes.eatonphil.com/practical-common-lisp-on-the-jvm.html<p>In a ridiculous attempt to <a href="https://news.ycombinator.com/item?id=28036679">prove an internet wrong</a> about the practicality of Lisp (Common Lisp specifically), I tried to get a simple (but realistic) web app running. After four days and <a href="https://github.com/armedbear/abcl/pull/379">a patch to ABCL</a> I got something working.</p> <p>The code I had in mind would look something like this:</p> <div class="highlight"><pre><span></span><span class="p">(</span><span class="k">let*</span><span class="w"> </span><span class="p">((</span><span class="nv">port</span><span class="w"> </span><span class="mi">8080</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="nv">server</span><span class="w"> </span><span class="p">(</span><span class="nv">make-server</span><span class="w"> </span><span class="nv">port</span><span class="p">)))</span> <span class="w"> </span><span class="p">(</span><span class="nv">route</span><span class="w"> </span><span class="nv">server</span><span class="w"> </span><span class="s">&quot;GET&quot;</span><span class="w"> </span><span class="s">&quot;/&quot;</span><span class="w"> </span><span class="p">(</span><span class="k">lambda</span><span class="w"> </span><span class="p">(</span><span class="nv">ctx</span><span class="p">)</span><span class="w"> </span><span class="s">&quot;My index!&quot;</span><span class="p">))</span> <span class="w"> </span><span class="p">(</span><span class="nv">route</span><span class="w"> </span><span class="nv">server</span><span class="w"> </span><span class="s">&quot;GET&quot;</span><span class="w"> </span><span class="s">&quot;/search&quot;</span> <span class="w"> </span><span class="p">(</span><span class="k">lambda</span><span class="w"> </span><span class="p">(</span><span class="nv">ctx</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="nv">template</span><span class="w"> </span><span class="s">&quot;search.tmpl&quot;</span><span class="w"> </span><span class="o">&#39;</span><span class="p">((</span><span class="s">&quot;version&quot;</span><span class="w"> </span><span class="s">&quot;0.1.0&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="s">&quot;results&quot;</span><span class="w"> </span><span class="p">(</span><span class="s">&quot;cat&quot;</span><span class="w"> </span><span class="s">&quot;dog&quot;</span><span class="w"> </span><span class="s">&quot;mouse&quot;</span><span class="p">)))))))</span> </pre></div> <p>And <code>search.tmpl</code> would be some Jinja-like text file:</p> <div class="highlight"><pre><span></span><span class="p">&lt;</span><span class="nt">html</span><span class="p">&gt;</span> <span class="p">&lt;</span><span class="nt">title</span><span class="p">&gt;</span>Version {{ version }}<span class="p">&lt;/</span><span class="nt">title</span><span class="p">&gt;</span> {% for item in results %} <span class="p">&lt;</span><span class="nt">h2</span><span class="p">&gt;</span>{{ item }}<span class="p">&lt;/</span><span class="nt">h2</span><span class="p">&gt;</span> {% endfor %} <span class="p">&lt;/</span><span class="nt">html</span><span class="p">&gt;</span> </pre></div> <p>The source code for this post can be found <a href="https://github.com/eatonphil/jvm-lisp-examples">on Github</a>.</p> <h3 id="picking-a-language,-libraries">Picking a language, libraries</h3><p><a href="https://abcl.org">Armed Bear Common Lisp</a> (ABCL) is the only Common Lisp implementation I'm aware of that can hook into a major ecosystem of libraries like the JVM or CLR has. In theory, this makes it a safe suggestion for folks who want the stability and resources of the ecosystem even if they aren't using its flagship language.</p> <p>I wanted to use some micro web framework like <a href="https://sparkjava.com/">Spark</a> or <a href="https://micronaut.io/">Micronaut</a>.</p> <p>The problem with libraries like Micronaut (and <a href="https://eclipse-ee4j.github.io/jersey/">Jersey</a>) is that they do a lot of dynamic inspection to figure out how to register controllers and whatnot. This is certainly convenient for developers using the library in Java. But it becomes an ordeal when you're trying to use the library through a foreign function interface (FFI) in another language. An example of this is if a framework scans all files in a directory for a <code> @GET</code> annotation.</p> <p>On the other hand, Spark had a seeming hard-requirement about bringing in a Websocket library which caused some issues during configuration. So I ended up going with <a href="https://jooby.io/">Jooby</a> and <a href="https://netty.io/">Netty</a> (as the underlying server).</p> <p>Finally, I looked into a few Jinja-like template libraries and settled on <a href="https://pebbletemplates.io/">Pebble</a> since <a href="https://github.com/HubSpot/jinjava">Jinjava</a> <a href="https://github.com/HubSpot/jinjava/issues/317">wouldn't load for me</a>.</p> <h3 id="3rd-party-jars-and-foreign-function-calls">3rd-party jars and foreign function calls</h3><p>So you've got your maven dependencies and ran <code>mvn install</code>. Your <code>pom.xml</code> looks like this:</p> <div class="highlight"><pre><span></span><span class="cp">&lt;?xml version=&quot;1.0&quot; encoding=&quot;utf-8&quot;?&gt;</span> <span class="nt">&lt;project&gt;</span> <span class="w"> </span><span class="nt">&lt;modelVersion&gt;</span>4.0.0<span class="nt">&lt;/modelVersion&gt;</span> <span class="w"> </span><span class="nt">&lt;groupId&gt;</span>com.github.eatonphil<span class="nt">&lt;/groupId&gt;</span> <span class="w"> </span><span class="nt">&lt;artifactId&gt;</span>abcl-rest-api-hello-world<span class="nt">&lt;/artifactId&gt;</span> <span class="w"> </span><span class="nt">&lt;version&gt;</span>1<span class="nt">&lt;/version&gt;</span> <span class="w"> </span><span class="nt">&lt;dependencies&gt;</span> <span class="w"> </span><span class="nt">&lt;dependency&gt;</span> <span class="w"> </span><span class="nt">&lt;groupId&gt;</span>io.jooby<span class="nt">&lt;/groupId&gt;</span> <span class="w"> </span><span class="nt">&lt;artifactId&gt;</span>jooby<span class="nt">&lt;/artifactId&gt;</span> <span class="w"> </span><span class="nt">&lt;version&gt;</span>2.10.0<span class="nt">&lt;/version&gt;</span> <span class="w"> </span><span class="nt">&lt;/dependency&gt;</span> <span class="w"> </span><span class="nt">&lt;dependency&gt;</span> <span class="w"> </span><span class="nt">&lt;groupId&gt;</span>io.jooby<span class="nt">&lt;/groupId&gt;</span> <span class="w"> </span><span class="nt">&lt;artifactId&gt;</span>jooby-netty<span class="nt">&lt;/artifactId&gt;</span> <span class="w"> </span><span class="nt">&lt;version&gt;</span>2.10.0<span class="nt">&lt;/version&gt;</span> <span class="w"> </span><span class="nt">&lt;/dependency&gt;</span> <span class="w"> </span><span class="nt">&lt;dependency&gt;</span> <span class="w"> </span><span class="nt">&lt;groupId&gt;</span>io.pebbletemplates<span class="nt">&lt;/groupId&gt;</span> <span class="w"> </span><span class="nt">&lt;artifactId&gt;</span>pebble<span class="nt">&lt;/artifactId&gt;</span> <span class="w"> </span><span class="nt">&lt;version&gt;</span>3.1.5<span class="nt">&lt;/version&gt;</span> <span class="w"> </span><span class="nt">&lt;/dependency&gt;</span> <span class="w"> </span><span class="nt">&lt;/dependencies&gt;</span> <span class="nt">&lt;/project&gt;</span> </pre></div> <p>ABCL has a package called <code>abcl-asdf</code> that helps you resolve dependencies through Maven and your filesystem. We'll import it and a package it depends on (<code>abcl-contrib</code>):</p> <div class="highlight"><pre><span></span><span class="p">(</span><span class="nb">require</span><span class="w"> </span><span class="ss">:abcl-contrib</span><span class="p">)</span> <span class="p">(</span><span class="nb">require</span><span class="w"> </span><span class="ss">:abcl-asdf</span><span class="p">)</span> </pre></div> <p>All our code will go into a single <code>main.lisp</code> file.</p> <p>To import a specific package from Maven you call <code>abcl-asdf:resolve</code> with a colon-separated string containing the Maven package group id and artifact id. Then you pass that result to <code>abcl-asdf:as-classpath</code> and pass that result to <code>java:add-to-classpath</code>.</p> <p>It will look like this:</p> <div class="highlight"><pre><span></span><span class="p">(</span><span class="nb">setf</span><span class="w"> </span><span class="nv">imports</span><span class="w"> </span><span class="o">&#39;</span><span class="p">(</span><span class="s">&quot;io.jooby:jooby&quot;</span> <span class="w"> </span><span class="s">&quot;io.jooby:jooby-netty&quot;</span> <span class="w"> </span><span class="s">&quot;io.pebbletemplates:pebble&quot;</span><span class="p">))</span> <span class="p">(</span><span class="nb">loop</span><span class="w"> </span><span class="nv">for</span><span class="w"> </span><span class="nb">import</span><span class="w"> </span><span class="nv">in</span><span class="w"> </span><span class="nv">imports</span> <span class="w"> </span><span class="nb">do</span><span class="w"> </span><span class="p">(</span><span class="nv">java:add-to-classpath</span> <span class="w"> </span><span class="p">(</span><span class="nv">abcl-asdf:as-classpath</span><span class="w"> </span><span class="p">(</span><span class="nv">abcl-asdf:resolve</span><span class="w"> </span><span class="nb">import</span><span class="p">))))</span> </pre></div> <p>Now you can call functions within these packages. If you want to call a Java method using only builtins it looks like <code>(jcall "method" "com.organization.package.Class" object arg1 arg2 ... argN)</code>. If you want to call a static Java method you use <code>(jstatic ...)</code> instead of <code>(jcall ...)</code>.</p> <p>It seems that ABCL will automatically convert simple types from their Lisp representation to Java but it will not turn a list into an array. If a Java function requires an array you'll have to do that explicitly with a function like <code>(java:jnew-array-from-list "java.lang.String" my-string-list)</code>.</p> <p>When using the builtin Java FFI you always need to use the fully qualified name for classes like <code>java.lang.Object</code> for <code>Object</code> or <code>java.util.Array</code> for <code>Array</code>.</p> <p>Alternatively you can <code>(require :jss)</code> to get access to a simpler syntax for making Java calls. A method call looks like <code>(#"method" object arg1 arg2 ... argN)</code>. Creating a new instance of an object is calling <code>(jss:jnew 'className)</code>. When you use JSS you don't need to fully qualify a class name unless there are more than one class with the same name. For example to create a new Jooby application instance we can call <code>(jss:jnew 'Jooby)</code>. As long as the class can be found in the class path JSS will resolve it.</p> <h3 id="some-real-code">Some real code</h3><p>The real code will look similar to the pseudo-code at the top of this article. We'll stub out the library-specific wrappers for rendering a template and for registering a route.</p> <p>Fumbling around the <a href="https://github.com/jooby-project/jooby/blob/2.x/jooby/src/main/java/io/jooby/Server.java#L35">Jooby source code</a> we see this snippet of Java:</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">Server</span><span class="w"> </span><span class="n">server</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">Netty</span><span class="p">();</span><span class="w"> </span><span class="c1">// or Jetty or Utow</span> <span class="w"> </span><span class="o">*</span> <span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">App</span><span class="w"> </span><span class="n">app</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">App</span><span class="p">();</span> <span class="w"> </span><span class="o">*</span> <span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">server</span><span class="p">.</span><span class="na">start</span><span class="p">(</span><span class="n">app</span><span class="p">);</span> <span class="w"> </span><span class="o">*</span> <span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="p">...</span> <span class="w"> </span><span class="o">*</span> <span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">server</span><span class="p">.</span><span class="na">stop</span><span class="p">();</span> </pre></div> <p><code>Netty</code> comes from the <code>jooby-netty</code> artifact in the <code>io.jooby</code> group on Maven. And <code>App</code> is some object that extends <code>io.jooby.Jooby</code>. Since we're not using an OOP language though we're going to try avoiding classes as much as possible. So we'll just create a new instance of <code>io.jooby.Jooby</code> and add routes directly to it.</p> <div class="highlight"><pre><span></span><span class="p">(</span><span class="nb">defun</span><span class="w"> </span><span class="nv">template</span><span class="w"> </span><span class="p">(</span><span class="nv">filename</span><span class="w"> </span><span class="nv">context</span><span class="p">)</span> <span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">)</span> <span class="p">(</span><span class="nb">defun</span><span class="w"> </span><span class="nv">route</span><span class="w"> </span><span class="p">(</span><span class="nv">app</span><span class="w"> </span><span class="nc">method</span><span class="w"> </span><span class="nv">path</span><span class="w"> </span><span class="nv">handler</span><span class="p">)</span> <span class="w"> </span><span class="no">nil</span><span class="p">)</span> <span class="p">(</span><span class="nb">defun</span><span class="w"> </span><span class="nv">register-endpoints</span><span class="w"> </span><span class="p">(</span><span class="nv">app</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="nv">route</span><span class="w"> </span><span class="nv">app</span><span class="w"> </span><span class="s">&quot;GET&quot;</span><span class="w"> </span><span class="s">&quot;/&quot;</span> <span class="w"> </span><span class="p">(</span><span class="k">lambda</span><span class="w"> </span><span class="p">(</span><span class="nv">ctx</span><span class="p">)</span><span class="w"> </span><span class="s">&quot;An index!&quot;</span><span class="p">))</span> <span class="w"> </span><span class="p">(</span><span class="nv">route</span><span class="w"> </span><span class="nv">app</span><span class="w"> </span><span class="s">&quot;GET&quot;</span><span class="w"> </span><span class="s">&quot;/search&quot;</span> <span class="w"> </span><span class="p">(</span><span class="k">lambda</span><span class="w"> </span><span class="p">(</span><span class="nv">ctx</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="nv">template</span><span class="w"> </span><span class="s">&quot;search.tmpl&quot;</span><span class="w"> </span><span class="o">`</span><span class="p">((</span><span class="s">&quot;version&quot;</span><span class="w"> </span><span class="s">&quot;1.0.0&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="s">&quot;results&quot;</span><span class="w"> </span><span class="o">,</span><span class="p">(</span><span class="nv">java:jarray-from-list</span><span class="w"> </span><span class="o">&#39;</span><span class="p">(</span><span class="s">&quot;cat&quot;</span><span class="w"> </span><span class="s">&quot;dog&quot;</span><span class="w"> </span><span class="s">&quot;mouse&quot;</span><span class="p">)))))))</span> <span class="w"> </span><span class="p">(</span><span class="nv">route</span><span class="w"> </span><span class="nv">app</span><span class="w"> </span><span class="s">&quot;GET&quot;</span><span class="w"> </span><span class="s">&quot;/hello-world&quot;</span> <span class="w"> </span><span class="p">(</span><span class="k">lambda</span><span class="w"> </span><span class="p">(</span><span class="nv">ctx</span><span class="p">)</span><span class="w"> </span><span class="s">&quot;Hello world!&quot;</span><span class="p">)))</span> <span class="p">(</span><span class="k">let*</span><span class="w"> </span><span class="p">((</span><span class="nv">port</span><span class="w"> </span><span class="mi">8080</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="nv">server</span><span class="w"> </span><span class="p">(</span><span class="nv">jss:new</span><span class="w"> </span><span class="ss">&#39;Netty</span><span class="p">))</span> <span class="w"> </span><span class="p">(</span><span class="nv">app</span><span class="w"> </span><span class="p">(</span><span class="nv">jss:new</span><span class="w"> </span><span class="ss">&#39;Jooby</span><span class="p">)))</span> <span class="w"> </span><span class="p">(</span><span class="nv">register-endpoints</span><span class="w"> </span><span class="nv">app</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="l l-Other">#&quot;setOptions&quot;</span><span class="w"> </span><span class="nv">server</span><span class="w"> </span><span class="p">(</span><span class="l l-Other">#&quot;setPort&quot;</span><span class="w"> </span><span class="p">(</span><span class="nv">jss:new</span><span class="w"> </span><span class="ss">&#39;ServerOptions</span><span class="p">)</span><span class="w"> </span><span class="nv">port</span><span class="p">))</span> <span class="w"> </span><span class="p">(</span><span class="l l-Other">#&quot;start&quot;</span><span class="w"> </span><span class="nv">server</span><span class="w"> </span><span class="nv">app</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="l l-Other">#&quot;join&quot;</span><span class="w"> </span><span class="nv">server</span><span class="p">))</span> </pre></div> <p>Easy enough. Now we just need to implement <code>route</code> and <code>template</code>.</p> <h3 id="implementing-java-classes-in-abcl">Implementing Java classes in ABCL</h3><p>We are again not going the happy path with fancy Java syntax (which is fine if you're using Java) like the Jooby documentation suggests. Scouring the <a href="https://github.com/jooby-project/jooby/blob/2.x/jooby/src/main/java/io/jooby/Jooby.java#L546">Jooby source code again</a> it looks like we can call <code>route</code> on the <code>Jooby</code> class with a method string, a path string, and an instance of an object implementing the <code>io.jooby.Route.Handler</code> interface.</p> <p>Since this handler argument is an interface, we cannot cheat again by creating an instance of it we'll have to actually create a new class in Lisp that extends it. Thankfully there's only one method we need to implement to satisfy this interface, <a href="https://github.com/jooby-project/jooby/blob/2.x/jooby/src/main/java/io/jooby/Route.java#L256">apply</a>. It accepts a <code>io.jooby.Context</code> object and returns a <code>java.lang.Object</code>. The framework then does introspection to figure out what exactly the object is and if it needs to transform it into a string to be returned as an HTTP response body.</p> <p>To create a new class in ABCL we call <code>(java:jnew-runtime-class "classname" :interfaces '("an interface name") :methods '(("method name 1" "return type" ("first parameter type" ...) (lambda (this arg1 ...) body))))</code>:</p> <div class="highlight"><pre><span></span><span class="p">(</span><span class="nb">defun</span><span class="w"> </span><span class="nv">route</span><span class="w"> </span><span class="p">(</span><span class="nv">app</span><span class="w"> </span><span class="nc">method</span><span class="w"> </span><span class="nv">path</span><span class="w"> </span><span class="nv">handler</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="l l-Other">#&quot;route&quot;</span> <span class="w"> </span><span class="nv">app</span> <span class="w"> </span><span class="nc">method</span> <span class="w"> </span><span class="nv">path</span> <span class="w"> </span><span class="p">(</span><span class="nv">jss:new</span><span class="w"> </span><span class="p">(</span><span class="nv">java:jnew-runtime-class</span> <span class="w"> </span><span class="p">(</span><span class="nb">substitute</span><span class="w"> </span><span class="sc">#\$</span><span class="w"> </span><span class="sc">#\/</span><span class="w"> </span><span class="p">(</span><span class="nb">substitute</span><span class="w"> </span><span class="sc">#\$</span><span class="w"> </span><span class="sc">#\-</span><span class="w"> </span><span class="nv">path</span><span class="p">))</span> <span class="w"> </span><span class="ss">:interfaces</span><span class="w"> </span><span class="o">&#39;</span><span class="p">(</span><span class="s">&quot;io.jooby.Route$Handler&quot;</span><span class="p">)</span> <span class="w"> </span><span class="ss">:methods</span><span class="w"> </span><span class="o">`</span><span class="p">(</span> <span class="w"> </span><span class="p">(</span><span class="s">&quot;apply&quot;</span><span class="w"> </span><span class="s">&quot;java.lang.Object&quot;</span><span class="w"> </span><span class="p">(</span><span class="s">&quot;io.jooby.Context&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="k">lambda</span><span class="w"> </span><span class="p">(</span><span class="nv">this</span><span class="w"> </span><span class="nv">ctx</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="nb">funcall</span><span class="w"> </span><span class="o">,</span><span class="nv">handler</span><span class="w"> </span><span class="nv">ctx</span><span class="p">))))))))</span> </pre></div> <p>One thing to note is that when referring to a subclass within a file we need to address it with the <code>io.jooby.Route$Handler</code> syntax rather than as you might refer to it in Java as <code>io.jooby.Route.Handler</code>. In the latter case ABCL thinks <code>Route</code> is a package when in fact it's just a class.</p> <p>If you run this now with <code>abcl --load main.lisp</code>. It will work until you hit an endpoint. The problem is how Jooby tries to figure out the real type of the returned object.</p> <p>The app will crash somewhere around <a href="https://github.com/jooby-project/jooby/blob/2.x/jooby/src/main/java/io/jooby/internal/RouterImpl.java#L560">here</a> calling <code>analyzer.returnType(route.getHandle())</code>.</p> <p>In this case it tries to <a href="https://github.com/jooby-project/jooby/blob/f47eda4500bc4b76b23d24d4d77aa2ab3cc19e95/jooby/src/main/java/io/jooby/internal/RouteAnalyzer.java#L44">open and parse the (Java) source code</a> of our application to try to find the return type for this <code>apply</code> function.</p> <p>That's a problem since our code isn't Java. Through trial and error I realized we can trick Jooby/Java/somebody into figuring out the correct return type by adding another implementation of <code>apply</code> that returns a <code>String</code> to our class.</p> <p>The full <code>route</code> code now looks like this:</p> <div class="highlight"><pre><span></span><span class="p">(</span><span class="nb">defun</span><span class="w"> </span><span class="nv">route</span><span class="w"> </span><span class="p">(</span><span class="nv">app</span><span class="w"> </span><span class="nc">method</span><span class="w"> </span><span class="nv">path</span><span class="w"> </span><span class="nv">handler</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="l l-Other">#&quot;route&quot;</span> <span class="w"> </span><span class="nv">app</span> <span class="w"> </span><span class="nc">method</span> <span class="w"> </span><span class="nv">path</span> <span class="w"> </span><span class="p">(</span><span class="nv">jss:new</span><span class="w"> </span><span class="p">(</span><span class="nv">java:jnew-runtime-class</span> <span class="w"> </span><span class="p">(</span><span class="nb">substitute</span><span class="w"> </span><span class="sc">#\$</span><span class="w"> </span><span class="sc">#\/</span><span class="w"> </span><span class="p">(</span><span class="nb">substitute</span><span class="w"> </span><span class="sc">#\$</span><span class="w"> </span><span class="sc">#\-</span><span class="w"> </span><span class="nv">path</span><span class="p">))</span> <span class="w"> </span><span class="ss">:interfaces</span><span class="w"> </span><span class="o">&#39;</span><span class="p">(</span><span class="s">&quot;io.jooby.Route$Handler&quot;</span><span class="p">)</span> <span class="w"> </span><span class="ss">:methods</span><span class="w"> </span><span class="o">`</span><span class="p">(</span> <span class="w"> </span><span class="c1">;; Need to define this one to make Jooby figure out the return type</span> <span class="w"> </span><span class="c1">;; Otherwise it tries to read &quot;this file&quot; which isn&#39;t a Java file so cannot be parsed</span> <span class="w"> </span><span class="p">(</span><span class="s">&quot;apply&quot;</span><span class="w"> </span><span class="s">&quot;java.lang.String&quot;</span><span class="w"> </span><span class="p">(</span><span class="s">&quot;io.jooby.Context&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="k">lambda</span><span class="w"> </span><span class="p">(</span><span class="nv">this</span><span class="w"> </span><span class="nv">ctx</span><span class="p">)</span><span class="w"> </span><span class="no">nil</span><span class="p">))</span> <span class="w"> </span><span class="c1">;; This one actually gets called</span> <span class="w"> </span><span class="p">(</span><span class="s">&quot;apply&quot;</span><span class="w"> </span><span class="s">&quot;java.lang.Object&quot;</span><span class="w"> </span><span class="p">(</span><span class="s">&quot;io.jooby.Context&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="k">lambda</span><span class="w"> </span><span class="p">(</span><span class="nv">this</span><span class="w"> </span><span class="nv">ctx</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="nb">funcall</span><span class="w"> </span><span class="o">,</span><span class="nv">handler</span><span class="w"> </span><span class="nv">ctx</span><span class="p">))))))))</span> </pre></div> <p>You may wonder, why keep the original method around? Well it's because during reflection, ABCL says no such method that returns <code>String</code> exists in the <code>Handler</code> interface. That's fair I guess.</p> <h3 id="implementing-the-template">Implementing the template</h3><p>The Java example on the <a href="https://pebbletemplates.io/">Pebble homepage</a> is perfect:</p> <div class="highlight"><pre><span></span><span class="n">PebbleEngine</span><span class="w"> </span><span class="n">engine</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">PebbleEngine</span><span class="p">.</span><span class="na">Builder</span><span class="p">().</span><span class="na">build</span><span class="p">();</span> <span class="n">PebbleTemplate</span><span class="w"> </span><span class="n">compiledTemplate</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">engine</span><span class="p">.</span><span class="na">getTemplate</span><span class="p">(</span><span class="s">&quot;home.html&quot;</span><span class="p">);</span> <span class="n">Map</span><span class="o">&lt;</span><span class="n">String</span><span class="p">,</span><span class="w"> </span><span class="n">Object</span><span class="o">&gt;</span><span class="w"> </span><span class="n">context</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">HashMap</span><span class="o">&lt;&gt;</span><span class="p">();</span> <span class="n">context</span><span class="p">.</span><span class="na">put</span><span class="p">(</span><span class="s">&quot;name&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Mitchell&quot;</span><span class="p">);</span> <span class="n">Writer</span><span class="w"> </span><span class="n">writer</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">StringWriter</span><span class="p">();</span> <span class="n">compiledTemplate</span><span class="p">.</span><span class="na">evaluate</span><span class="p">(</span><span class="n">writer</span><span class="p">,</span><span class="w"> </span><span class="n">context</span><span class="p">);</span> <span class="n">String</span><span class="w"> </span><span class="n">output</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">writer</span><span class="p">.</span><span class="na">toString</span><span class="p">();</span> </pre></div> <p>We can easily translate this into Lisp:</p> <div class="highlight"><pre><span></span><span class="p">(</span><span class="nb">defun</span><span class="w"> </span><span class="nv">hashmap</span><span class="w"> </span><span class="p">(</span><span class="nv">alist</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="k">let</span><span class="w"> </span><span class="p">((</span><span class="nb">map</span><span class="w"> </span><span class="p">(</span><span class="nv">jss:new</span><span class="w"> </span><span class="ss">&#39;HashMap</span><span class="p">)))</span> <span class="w"> </span><span class="p">(</span><span class="nb">loop</span><span class="w"> </span><span class="nv">for</span><span class="w"> </span><span class="nv">el</span><span class="w"> </span><span class="nv">in</span><span class="w"> </span><span class="nv">alist</span> <span class="w"> </span><span class="nb">do</span><span class="w"> </span><span class="p">(</span><span class="l l-Other">#&quot;put&quot;</span><span class="w"> </span><span class="nb">map</span><span class="w"> </span><span class="p">(</span><span class="nb">car</span><span class="w"> </span><span class="nv">el</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="nb">cadr</span><span class="w"> </span><span class="nv">el</span><span class="p">)))</span> <span class="w"> </span><span class="nb">map</span><span class="p">))</span> <span class="p">(</span><span class="nb">defun</span><span class="w"> </span><span class="nv">template</span><span class="w"> </span><span class="p">(</span><span class="nv">filename</span><span class="w"> </span><span class="nv">context-alist</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="k">let*</span><span class="w"> </span><span class="p">((</span><span class="nv">ctx</span><span class="w"> </span><span class="p">(</span><span class="nv">hashmap</span><span class="w"> </span><span class="nv">context-alist</span><span class="p">))</span> <span class="w"> </span><span class="p">(</span><span class="nv">path</span><span class="w"> </span><span class="p">(</span><span class="nv">java:jstatic</span><span class="w"> </span><span class="s">&quot;of&quot;</span><span class="w"> </span><span class="s">&quot;java.nio.file.Path&quot;</span><span class="w"> </span><span class="nv">filename</span><span class="p">))</span> <span class="w"> </span><span class="p">(</span><span class="nv">file</span><span class="w"> </span><span class="p">(</span><span class="l l-Other">#&quot;readString&quot;</span><span class="w"> </span><span class="ss">&#39;java.nio.file.Files</span><span class="w"> </span><span class="nv">path</span><span class="p">))</span> <span class="w"> </span><span class="p">(</span><span class="nv">engine</span><span class="w"> </span><span class="p">(</span><span class="l l-Other">#&quot;build&quot;</span><span class="w"> </span><span class="p">(</span><span class="nv">jss:new</span><span class="w"> </span><span class="ss">&#39;PebbleEngine$Builder</span><span class="p">)))</span> <span class="w"> </span><span class="p">(</span><span class="nv">compiledTmpl</span><span class="w"> </span><span class="p">(</span><span class="l l-Other">#&quot;getTemplate&quot;</span><span class="w"> </span><span class="nv">engine</span><span class="w"> </span><span class="nv">filename</span><span class="p">))</span> <span class="w"> </span><span class="p">(</span><span class="nv">writer</span><span class="w"> </span><span class="p">(</span><span class="nv">jss:new</span><span class="w"> </span><span class="ss">&#39;java.io.StringWriter</span><span class="p">)))</span> <span class="w"> </span><span class="p">(</span><span class="l l-Other">#&quot;evaluate&quot;</span><span class="w"> </span><span class="nv">compiledTmpl</span><span class="w"> </span><span class="nv">writer</span><span class="w"> </span><span class="nv">ctx</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="l l-Other">#&quot;toString&quot;</span><span class="w"> </span><span class="nv">writer</span><span class="p">)))</span> </pre></div> <p>But if you run this <code>abcl --load main.lisp</code> and hit this <code>/search</code> endpoint, it will blow up saying "no such method" exists at the call to <code>Path.of(filename)</code>.</p> <p>After digging around I saw it was because <a href="https://docs.oracle.com/en/java/javase/16/docs/api/java.base/java/nio/file/Path.html#of(java.lang.String,java.lang.String...%29">Path.of</a> is a variadic function.</p> <p>And while there are <a href="https://abcl.org/trac/changeset/15234">examples of</a> using variadic functions when the function only has a single parameter like <code>java.util.Arrays.asList(T ...)</code>, employing that same technique here continued to result in "no such method":</p> <div class="highlight"><pre><span></span> (path (java:jstatic &quot;of&quot; &quot;java.nio.file.Path&quot; filename (jnew-array &quot;java.lang.String&quot; 0))) </pre></div> <p>Eventually I found an <a href="https://stackoverflow.com/questions/20440839/cant-invoke-method-with-varargs-parameters-with-reflection-nosuchmethodexcept">example of someone doing reflect/invoke on this kind of a function call</a> and tried this logic on a local copy of the ABCL source code.</p> <p>It worked. So I opened a <a href="https://github.com/armedbear/abcl/pull/379">pull request</a>.</p> <p>So the full working code for <code>template</code> is:</p> <div class="highlight"><pre><span></span><span class="p">(</span><span class="nb">defun</span><span class="w"> </span><span class="nv">template</span><span class="w"> </span><span class="p">(</span><span class="nv">filename</span><span class="w"> </span><span class="nv">context-alist</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="k">let*</span><span class="w"> </span><span class="p">((</span><span class="nv">ctx</span><span class="w"> </span><span class="p">(</span><span class="nv">hashmap</span><span class="w"> </span><span class="nv">context-alist</span><span class="p">))</span> <span class="w"> </span><span class="p">(</span><span class="nv">path</span><span class="w"> </span><span class="p">(</span><span class="nv">java:jstatic</span><span class="w"> </span><span class="s">&quot;of&quot;</span><span class="w"> </span><span class="s">&quot;java.nio.file.Path&quot;</span><span class="w"> </span><span class="nv">filename</span><span class="w"> </span><span class="p">(</span><span class="nv">java:jnew-array</span><span class="w"> </span><span class="s">&quot;java.lang.String&quot;</span><span class="w"> </span><span class="mi">0</span><span class="p">)))</span> <span class="w"> </span><span class="p">(</span><span class="nv">file</span><span class="w"> </span><span class="p">(</span><span class="l l-Other">#&quot;readString&quot;</span><span class="w"> </span><span class="ss">&#39;java.nio.file.Files</span><span class="w"> </span><span class="nv">path</span><span class="p">))</span> <span class="w"> </span><span class="p">(</span><span class="nv">engine</span><span class="w"> </span><span class="p">(</span><span class="l l-Other">#&quot;build&quot;</span><span class="w"> </span><span class="p">(</span><span class="nv">jss:new</span><span class="w"> </span><span class="ss">&#39;PebbleEngine$Builder</span><span class="p">)))</span> <span class="w"> </span><span class="p">(</span><span class="nv">compiledTmpl</span><span class="w"> </span><span class="p">(</span><span class="l l-Other">#&quot;getTemplate&quot;</span><span class="w"> </span><span class="nv">engine</span><span class="w"> </span><span class="nv">filename</span><span class="p">))</span> <span class="w"> </span><span class="p">(</span><span class="nv">writer</span><span class="w"> </span><span class="p">(</span><span class="nv">jss:new</span><span class="w"> </span><span class="ss">&#39;java.io.StringWriter</span><span class="p">)))</span> <span class="w"> </span><span class="p">(</span><span class="l l-Other">#&quot;evaluate&quot;</span><span class="w"> </span><span class="nv">compiledTmpl</span><span class="w"> </span><span class="nv">writer</span><span class="w"> </span><span class="nv">ctx</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="l l-Other">#&quot;toString&quot;</span><span class="w"> </span><span class="nv">writer</span><span class="p">)))</span> </pre></div> <p>And to get this diff running locally:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>mkdir<span class="w"> </span>~/vendor $<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>~/vendor $<span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https://github.com/eatonphil/abcl $<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>abcl $<span class="w"> </span>git<span class="w"> </span>checkout<span class="w"> </span>pe/more-variadic $<span class="w"> </span>sudo<span class="w"> </span><span class="o">{</span>dnf/brew/apt<span class="o">}</span><span class="w"> </span>install<span class="w"> </span>ant<span class="w"> </span>maven $<span class="w"> </span>ant<span class="w"> </span>-f<span class="w"> </span>build.xml </pre></div> <p>And to run <code>main.lisp</code> using this diff:</p> <div class="highlight"><pre><span></span><span class="o">$</span><span class="w"> </span><span class="o">~/</span><span class="n">vendor</span><span class="o">/</span><span class="n">abcl</span><span class="o">/</span><span class="n">abcl</span><span class="w"> </span><span class="o">--</span><span class="nb">load</span><span class="w"> </span><span class="n">main</span><span class="o">.</span><span class="n">lisp</span> </pre></div> <p>And to hit the API:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>curl<span class="w"> </span>localhost:8080/search &lt;html&gt; &lt;title&gt;Version<span class="w"> </span><span class="m">1</span>.0.0&lt;/title&gt; <span class="w"> </span>&lt;h2&gt;cat&lt;/h2&gt; <span class="w"> </span>&lt;h2&gt;dog&lt;/h2&gt; <span class="w"> </span>&lt;h2&gt;mouse&lt;/h2&gt; &lt;/html&gt; $<span class="w"> </span>curl<span class="w"> </span>localhost:8080/hello-world Hello<span class="w"> </span>world!% </pre></div> <p>Phew! Easy peasy.</p> <h3 id="next-up">Next up</h3><p>I'm porting this example to Kawa to see how it fares. Blog post to come.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">In a ridiculous attempt to prove an internet wrong about the practicality of Lisp (Common Lisp specifically), I tried to get a simple (but realistic) web app running. After four days and a patch to ABCL I got something working.<a href="https://t.co/5UUWNR8Wnn">https://t.co/5UUWNR8Wnn</a> <a href="https://t.co/cZsx32IlKD">pic.twitter.com/cZsx32IlKD</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1423345414279942150?ref_src=twsrc%5Etfw">August 5, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/practical-common-lisp-on-the-jvm.htmlThu, 05 Aug 2021 00:00:00 +0000Writing an efficient object previewer for JavaScripthttp://notes.eatonphil.com/writing-an-efficient-javascript-object-previewer.html<head> <meta http-equiv="refresh" content="4;URL='https://datastation.multiprocess.io/blog/2021-07-15-writing-an-efficient-javascript-object-previewer.html'" /> </head><p>This is an external post of mine. Click <a href="https://datastation.multiprocess.io/blog/2021-07-15-writing-an-efficient-javascript-object-previewer.html">here</a> if you are not redirected.</p> http://notes.eatonphil.com/writing-an-efficient-javascript-object-previewer.htmlThu, 15 Jul 2021 00:00:00 +0000React without webpack: fast path to a working app from scratchhttp://notes.eatonphil.com/react-without-webpack.html<head> <meta http-equiv="refresh" content="4;URL='https://datastation.multiprocess.io/blog/2021-07-08-react-without-webpack.html'" /> </head><p>This is an external post of mine. Click <a href="https://datastation.multiprocess.io/blog/2021-07-08-react-without-webpack.html">here</a> if you are not redirected.</p> http://notes.eatonphil.com/react-without-webpack.htmlThu, 08 Jul 2021 00:00:00 +0000Controlled HTML select element in React has weird default UXhttp://notes.eatonphil.com/controlled-select-element-in-react-has-weird-ux.html<head> <meta http-equiv="refresh" content="4;URL='https://datastation.multiprocess.io/blog/2021-06-25-select-in-react-broken-by-default.html'" /> </head><p>This is an external post of mine. Click <a href="https://datastation.multiprocess.io/blog/2021-06-25-select-in-react-broken-by-default.html">here</a> if you are not redirected.</p> http://notes.eatonphil.com/controlled-select-element-in-react-has-weird-ux.htmlFri, 25 Jun 2021 00:00:00 +0000Leaders, you need to share organization success stories more frequentlyhttp://notes.eatonphil.com/leaders-share-company-success-stories.html<p>This post goes out to anyone who leads a team: managers, directors, VPs, executives. You need to share organization success stories with your organization on a regular and frequent basis. Talk about sales wins, talk about new services released, talk about the positive impact of a recent organizational change. Just get in front of your entire organization and tell them how the organization is making a positive difference.</p> <p>Do this at least every other week.</p> <p>And in case it's not clear, by "success stories" I don't mean nonsense, or opinions. I mean concrete, measurable things that moved the organization forward.</p> <p>Everyone in your organization is contributing to these stories and it's your job to feed the stories back.</p> <p>Leaders have a tendency to hear about successes but don't always remember to propagate the stories down. I've been guilty of this myself. This post is your (and my own) friendly reminder.</p> <p>If you don't keep reminding your folks their organization is making a positive impact, they're going to forget it. You'll miss out on the freely available chance to give reassurance to your best people.</p> <p>Talented folks want to be invested in an organization that is succeeding.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Wrote a post for all the people managers, directors, VPs out there: you need to regularly share success stories with your whole organization. Everyone wants to be part of an organization that is doing good work.<a href="https://t.co/XgaY5Ri1tA">https://t.co/XgaY5Ri1tA</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1407451413156929537?ref_src=twsrc%5Etfw">June 22, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/leaders-share-company-success-stories.htmlTue, 22 Jun 2021 00:00:00 +0000Languages you can run in the browser, part 1: Python, JavaScript, SQLitehttp://notes.eatonphil.com/languages-you-can-run-in-the-browser.html<head> <meta http-equiv="refresh" content="4;URL='https://datastation.multiprocess.io/blog/2021-06-16-languages-you-can-run-in-the-browser.html'" /> </head><p>This is an external post of mine. Click <a href="https://datastation.multiprocess.io/blog/2021-06-16-languages-you-can-run-in-the-browser.html">here</a> if you are not redirected.</p> http://notes.eatonphil.com/languages-you-can-run-in-the-browser.htmlThu, 17 Jun 2021 00:00:00 +0000Coolest hard-tech companies in NYC 2021http://notes.eatonphil.com/coolest-tech-companies-in-nyc-2021.html<p>For years I've kept a private list of really cool tech companies in NYC. Now that I'm funemployed it's the perfect time to publish. This list is influenced by 1) my perception of the difficulty of the engineering behind the product and 2) the company's educational and OSS presence.</p> <p>With no further ado and in no particular order, here's my list!</p> <h3 id="backtrace">Backtrace</h3><p>This company builds a product for debugging mobile crashes. Your app produces a crash dump and their debugger will help you figure out what went wrong. That's freaking awesome.</p> <p><a href="https://backtrace.io">https://backtrace.io</a></p> <h3 id="equinix-metal-(previously-packet)">Equinix Metal (previously Packet)</h3><p>This company provides an API around scheduling hardware servers in their datacenters, not virtual machines. That's nuts.</p> <p><a href="https://packet.com">https://packet.com</a></p> <h3 id="digital-ocean">Digital Ocean</h3><p>Ok I used to work for Linode and am still a massive fan but I love all the clouds and this post is about NYC not Philly. If you want to learn how Linux works you have to work here.</p> <p><a href="https://www.digitalocean.com/">https://www.digitalocean.com/</a></p> <h3 id="ns1">NS1</h3><p>This company does DNS. Seeing as <a href="https://www.cyberciti.biz/humour/a-haiku-about-dns/">it was DNS</a>, if you want to understand how the internet works go work for this group.</p> <p><a href="https://ns1.com/">https://ns1.com/</a></p> <h3 id="squarespace">SquareSpace</h3><p>The first program I made in 7th grade was a Java program that generated HTML from terminal prompts in my first attempt at a CMS. Stuff that builds stuff is amazing and SquareSpace is kinda OG.</p> <p>They also just IPO-ed so the comp won't be imaginary!</p> <p>Disclosure: my wife works here, but they've been on my list longer than that.</p> <p><a href="https://www.squarespace.com/">https://www.squarespace.com/</a></p> <h3 id="grafana">Grafana</h3><p>Amazing platform. Everyone who can't afford Splunk or doesn't want to buy competitor's products uses ElasticSearch and Grafana. I didn't realize until double-checking my research that Grafana is even based in NYC. Let's hope they're hiring developers here.</p> <p><a href="https://grafana.com/">https://grafana.com/</a></p> <h3 id="frame.io">Frame.io</h3><p>It's like Figma for video. Clearly the future.</p> <p><a href="https://www.frame.io/">https://www.frame.io/</a></p> <h3 id="datadog">DataDog</h3><p>DataDog feels like the only real competitor in the hosted server analytics.</p> <p>Their stock has been doing surprisingly well, or maybe I'm just tired from WeWork, Uber, et al.</p> <p><a href="https://www.datadoghq.com/">https://www.datadoghq.com/</a></p> <h3 id="chronosphere">Chronosphere</h3><p>I'm a sucker for startups doing hosted data and search because that's really hard. Chronosphere does Uber-scale log storage/analysis.</p> <p><a href="https://chronosphere.io/">https://chronosphere.io/</a></p> <h3 id="cockroach-labs">Cockroach Labs</h3><p>Worst company name but maybe one of the single coolest products in NYC. They built a PostgreSQL compatible scalable platform in Go. Everything about that is amazing.</p> <p>They've also turned down my application like 5 times now though so maybe they're very picky. :)</p> <p><a href="https://www.cockroachlabs.com/">https://www.cockroachlabs.com/</a></p> <h3 id="mongodb">MongoDB</h3><p>It's cloud scale! Need more be said.</p> <p><a href="https://www.mongodb.com/">https://www.mongodb.com/</a></p> <h3 id="trail-of-bits">Trail of Bits</h3><p>I don't actually understand what they do or if they have a product but their <a href="https://github.com/trailofbits">Github presence</a> is amazing and they're dedicated to educating the community which is one of the most important things I think a company can do.</p> <p><a href="https://www.trailofbits.com/">https://www.trailofbits.com/</a></p> <h3 id="capsule8">Capsule8</h3><p>I moved to NYC for this company because the founders and product are insane. If you want to learn how compilers and Linux don't work, you've got to come here.</p> <p>Disclosure: I own stock.</p> <p><a href="https://capsule8.com/">https://capsule8.com/</a></p> <h3 id="two-sigma">Two Sigma</h3><p>Algorithmic trading? Maybe the smartest guys in NYC? They don't accept candidates without bachelor's degrees or they just don't like me. ;) They also host the only good tech meetups in NYC: Linux User Group and Papers We Love.</p> <p><a href="https://www.twosigma.com/">https://www.twosigma.com/</a></p> <h3 id="jane-street">Jane Street</h3><p>Another algorithmic trading company but this time with OCaml. They're so crazy <a href="https://blog.janestreet.com/what-the-interns-have-wrought-2018/">you should see what the intern built</a>.</p> <p><a href="https://www.janestreet.com/">https://www.janestreet.com/</a></p> <h3 id="vimeo">Vimeo</h3><p>Everybody loves an underdog story. And the <a href="https://www.linkedin.com/pulse/now-shes-ceo-vimeo-after-rejected-dozens-companies-mamta-shah-/">CEO seems really cool</a>.</p> <p><a href="https://vimeo.com/">https://vimeo.com/</a></p> <h3 id="etsy">Etsy</h3><p>Their blog posts and engineering organization philosophy are widely regarded. And they've got a sweet headquarters in Brooklyn.</p> <p><a href="https://www.etsy.com/">https://www.etsy.com/</a></p> <h3 id="sisense">Sisense</h3><p>If you're not using ElasticSearch and you're not using Splunk, you might be using Sisense. Again, I'm a big sucker for data and analytics platforms.</p> <p><a href="https://www.sisense.com/">https://www.sisense.com/</a></p> <h3 id="codeacademy">CodeAcademy</h3><p>I am 100% on board with giving people opportunities in tech.</p> <p><a href="https://www.codecademy.com/">https://www.codecademy.com/</a></p> <h3 id="stack-overflow">Stack Overflow</h3><p>They were just bought! But they still exist I suppose. If you love .NET you've got to work here.</p> <p><a href="https://stackoverflow.com/">https://stackoverflow.com/</a></p> <h3 id="that's-it!">That's it!</h3><p>Tell me what you think and if I'm missing any hard-tech companies in NYC. I'm sure I am.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Here&#39;s my light-hearted take on some of the coolest tech companies in NYC in 2021<a href="https://twitter.com/Frame_io?ref_src=twsrc%5Etfw">@Frame_io</a> <a href="https://twitter.com/equinixmetal?ref_src=twsrc%5Etfw">@equinixmetal</a> <a href="https://twitter.com/digitalocean?ref_src=twsrc%5Etfw">@digitalocean</a> <a href="https://twitter.com/capsule8?ref_src=twsrc%5Etfw">@capsule8</a> <a href="https://twitter.com/NS1?ref_src=twsrc%5Etfw">@NS1</a> <a href="https://twitter.com/grafana?ref_src=twsrc%5Etfw">@grafana</a> <a href="https://twitter.com/CockroachDB?ref_src=twsrc%5Etfw">@CockroachDB</a> <a href="https://twitter.com/squarespace?ref_src=twsrc%5Etfw">@squarespace</a> <a href="https://twitter.com/chronosphereio?ref_src=twsrc%5Etfw">@chronosphereio</a> <a href="https://twitter.com/datadoghq?ref_src=twsrc%5Etfw">@datadoghq</a> <a href="https://twitter.com/MongoDB?ref_src=twsrc%5Etfw">@MongoDB</a> <a href="https://twitter.com/trailofbits?ref_src=twsrc%5Etfw">@trailofbits</a> <a href="https://twitter.com/twosigma?ref_src=twsrc%5Etfw">@twosigma</a> <a href="https://twitter.com/Vimeo?ref_src=twsrc%5Etfw">@Vimeo</a> <a href="https://twitter.com/Etsy?ref_src=twsrc%5Etfw">@Etsy</a> and more<a href="https://t.co/ZAcvptvLbZ">https://t.co/ZAcvptvLbZ</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1400815765117353989?ref_src=twsrc%5Etfw">June 4, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/coolest-tech-companies-in-nyc-2021.htmlFri, 04 Jun 2021 00:00:00 +0000Writing a Jinja-inspired template library in Pythonhttp://notes.eatonphil.com/writing-a-template-library-in-python.html<p>In this post we'll build a minimal text templating library in Python inspired by Jinja. It will be able to display variables and iterate over arrays.</p> <p>By the end of this article, with around 300 lines of code, we'll be able to create this program:</p> <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">pytemplate</span> <span class="kn">import</span> <span class="n">eval_template</span> <span class="n">template</span> <span class="o">=</span> <span class="s1">&#39;&#39;&#39;</span> <span class="s1">&lt;html&gt;</span> <span class="s1"> &lt;body&gt;</span> <span class="s1"> {</span><span class="si">% f</span><span class="s1">or-in(post, posts) %}</span> <span class="s1"> &lt;article&gt;</span> <span class="s1"> &lt;h1&gt;{{ get(post, &#39;title&#39;) }}&lt;/h1&gt;</span> <span class="s1"> &lt;p&gt;</span> <span class="s1"> {{ get(post, &#39;body&#39;) }}</span> <span class="s1"> &lt;/p&gt;</span> <span class="s1"> &lt;/article&gt;</span> <span class="s1"> {</span><span class="si">% e</span><span class="s1">ndfor-in %}</span> <span class="s1"> &lt;/body&gt;</span> <span class="s1">&lt;/html&gt;</span> <span class="s1">&#39;&#39;&#39;</span> <span class="n">env</span> <span class="o">=</span> <span class="p">{</span> <span class="s1">&#39;posts&#39;</span><span class="p">:</span> <span class="p">[</span> <span class="p">{</span> <span class="s1">&#39;title&#39;</span><span class="p">:</span> <span class="s1">&#39;Hello world!&#39;</span><span class="p">,</span> <span class="s1">&#39;body&#39;</span><span class="p">:</span> <span class="s1">&#39;This is my first post!&#39;</span><span class="p">,</span> <span class="p">},</span> <span class="p">{</span> <span class="s1">&#39;title&#39;</span><span class="p">:</span> <span class="s1">&#39;Take two&#39;</span><span class="p">,</span> <span class="s1">&#39;body&#39;</span><span class="p">:</span> <span class="s1">&#39;This is a second post.&#39;</span><span class="p">,</span> <span class="p">},</span> <span class="p">],</span> <span class="p">}</span> <span class="nb">print</span><span class="p">(</span><span class="n">eval_template</span><span class="p">(</span><span class="n">template</span><span class="p">,</span> <span class="n">env</span><span class="p">))</span> </pre></div> <p>That runs and produces what we expect:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>python3<span class="w"> </span>test.py &lt;html&gt; <span class="w"> </span>&lt;body&gt; <span class="w"> </span>&lt;article&gt; <span class="w"> </span>&lt;h1&gt;Hello<span class="w"> </span>world!&lt;/h1&gt; <span class="w"> </span>&lt;p&gt; <span class="w"> </span>This<span class="w"> </span>is<span class="w"> </span>my<span class="w"> </span>first<span class="w"> </span>post! <span class="w"> </span>&lt;/p&gt; <span class="w"> </span>&lt;/article&gt; <span class="w"> </span>&lt;article&gt; <span class="w"> </span>&lt;h1&gt;Take<span class="w"> </span>two&lt;/h1&gt; <span class="w"> </span>&lt;p&gt; <span class="w"> </span>This<span class="w"> </span>is<span class="w"> </span>a<span class="w"> </span>second<span class="w"> </span>post. <span class="w"> </span>&lt;/p&gt; <span class="w"> </span>&lt;/article&gt; <span class="w"> </span>&lt;/body&gt; &lt;/html&gt; </pre></div> <p>All code is available on <a href="https://github.com/eatonphil/pytemplate">Github</a>. Let's dig in.</p> <h3 id="specification">Specification</h3><p>In this templating language, pytemplate, <code>{% $function () %} ... {% end$function %}</code> blocks are specially evaluated depending on the particular function being called. For example, the <code>for-in ($iter_name, $array)</code> function will duplicate its children for every element in <code>$array</code>. Within the body of the loop, the variable <code>$iter_name</code> will exist and be set to the current element in the array.</p> <p>While we won't implement it here, you can imagine what the <code>if ($test)</code> block function might do.</p> <h3 id="arguments,-expressions,-function-calls:-nodes">Arguments, expressions, function calls: nodes</h3><p>Function arguments are expressions (or <code>nodes</code> as we'll call them). They can be strings (surrounded by single quotes), identifiers found in a provided dictionary (or <code>environment</code> as we'll call it), or nested function calls (also called nodes).</p> <h3 id="non-blocks:-tags">Non-blocks: tags</h3><p>The non-block syntax <code>{{ ... }}</code> are just called tags. The inside of a tag is a node and is evaluated the same way a function argument is.</p> <h3 id="architecture">Architecture</h3><p>We'll break up the library into a few main parts:</p> <ul> <li>Lexer for the node language</li> <li>Parser for the node language</li> <li>Lexer for blocks, tags, and text</li> <li>Parser for blocks, tags, and text</li> <li>Interpreter that takes an AST and an environment dictionary and produces text</li> <li>An entrypoint to tie all the above together</li> </ul> <p>We'll tackle these aspects in roughly reverse order.</p> <h3 id="entrypoint">Entrypoint</h3><p>When we call the library we want to be able to just accept a template string and an environment dictionary. The result of the entrypoint will be the evaluated template.</p> <p><span class="code-caption">pytemplate.py</span></p> <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">io</span> <span class="k">def</span> <span class="nf">eval_template</span><span class="p">(</span><span class="n">template</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">env</span><span class="p">:</span> <span class="nb">dict</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">str</span><span class="p">:</span> <span class="n">tokens</span> <span class="o">=</span> <span class="n">lex</span><span class="p">(</span><span class="n">template</span><span class="p">)</span> <span class="n">ast</span><span class="p">,</span> <span class="n">_</span> <span class="o">=</span> <span class="n">parse</span><span class="p">(</span><span class="n">tokens</span><span class="p">)</span> <span class="k">with</span> <span class="n">io</span><span class="o">.</span><span class="n">StringIO</span><span class="p">()</span> <span class="k">as</span> <span class="n">memfd</span><span class="p">:</span> <span class="n">interpret</span><span class="p">(</span><span class="n">memfd</span><span class="p">,</span> <span class="n">ast</span><span class="p">,</span> <span class="n">env</span><span class="p">)</span> <span class="k">return</span> <span class="n">memfd</span><span class="o">.</span><span class="n">getvalue</span><span class="p">()</span> </pre></div> <p>Where lex, parse, and interpret have to do with the block- and tag-level language.</p> <h3 id="block,-tag-and-text-lexing">Block, tag and text lexing</h3><p>This process is responsible for turning the template string into an array of tokens. To make the code simpler, lexing for the function call and expression language is done separately. At this stage all we'll look for is tokens consisting of block and tag end and beginning markers. So just <code>{%</code>, <code>%}</code>, <code>{{</code>, <code>}}</code>. If a token is not one of these, it is regular text.</p> <p><span class="code-caption">pytemplate.py</span></p> <div class="highlight"><pre><span></span><span class="n">BLOCK_OPEN</span> <span class="o">=</span> <span class="s1">&#39;{%&#39;</span> <span class="n">BLOCK_CLOSE</span> <span class="o">=</span> <span class="s1">&#39;%}&#39;</span> <span class="n">TAG_OPEN</span> <span class="o">=</span> <span class="s1">&#39;{{&#39;</span> <span class="n">TAG_CLOSE</span> <span class="o">=</span> <span class="s1">&#39;}}&#39;</span> <span class="k">def</span> <span class="nf">getelement</span><span class="p">(</span><span class="n">source</span><span class="p">,</span> <span class="n">cursor</span><span class="p">):</span> <span class="k">if</span> <span class="n">cursor</span> <span class="o">&lt;</span> <span class="mi">0</span><span class="p">:</span> <span class="k">return</span> <span class="kc">None</span> <span class="k">if</span> <span class="n">cursor</span> <span class="o">&lt;</span> <span class="nb">len</span><span class="p">(</span><span class="n">source</span><span class="p">):</span> <span class="k">return</span> <span class="n">source</span><span class="p">[</span><span class="n">cursor</span><span class="p">]</span> <span class="k">return</span> <span class="kc">None</span> <span class="k">def</span> <span class="nf">lex</span><span class="p">(</span><span class="n">source</span><span class="p">):</span> <span class="n">tokens</span> <span class="o">=</span> <span class="p">[]</span> <span class="n">current</span> <span class="o">=</span> <span class="s1">&#39;&#39;</span> <span class="n">cursor</span> <span class="o">=</span> <span class="mi">0</span> <span class="k">while</span> <span class="n">cursor</span> <span class="o">&lt;</span> <span class="nb">len</span><span class="p">(</span><span class="n">source</span><span class="p">):</span> <span class="n">char</span> <span class="o">=</span> <span class="n">getelement</span><span class="p">(</span><span class="n">source</span><span class="p">,</span> <span class="n">cursor</span><span class="p">)</span> <span class="k">if</span> <span class="n">char</span> <span class="o">==</span> <span class="s1">&#39;{&#39;</span><span class="p">:</span> <span class="c1"># Handle escaping {</span> <span class="k">if</span> <span class="n">getelement</span><span class="p">(</span><span class="n">source</span><span class="p">,</span> <span class="n">cursor</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="o">==</span> <span class="s1">&#39;{&#39;</span><span class="p">:</span> <span class="n">cursor</span> <span class="o">+=</span> <span class="mi">1</span> <span class="k">continue</span> <span class="n">next_char</span> <span class="o">=</span> <span class="n">getelement</span><span class="p">(</span><span class="n">source</span><span class="p">,</span> <span class="n">cursor</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span> <span class="k">if</span> <span class="n">next_char</span> <span class="ow">in</span> <span class="p">[</span><span class="s1">&#39;%&#39;</span><span class="p">,</span> <span class="s1">&#39;{&#39;</span><span class="p">]:</span> <span class="k">if</span> <span class="n">current</span><span class="p">:</span> <span class="n">tokens</span><span class="o">.</span><span class="n">append</span><span class="p">({</span> <span class="s1">&#39;value&#39;</span><span class="p">:</span> <span class="n">current</span><span class="p">,</span> <span class="s1">&#39;cursor&#39;</span><span class="p">:</span> <span class="n">cursor</span> <span class="o">-</span> <span class="nb">len</span><span class="p">(</span><span class="n">current</span><span class="p">),</span> <span class="p">})</span> <span class="n">current</span> <span class="o">=</span> <span class="s1">&#39;&#39;</span> <span class="n">tokens</span><span class="o">.</span><span class="n">append</span><span class="p">({</span> <span class="s1">&#39;value&#39;</span><span class="p">:</span> <span class="n">BLOCK_OPEN</span> <span class="k">if</span> <span class="n">next_char</span> <span class="o">==</span> <span class="s1">&#39;%&#39;</span> <span class="k">else</span> <span class="n">TAG_OPEN</span><span class="p">,</span> <span class="s1">&#39;cursor&#39;</span><span class="p">:</span> <span class="n">cursor</span><span class="p">,</span> <span class="p">})</span> <span class="n">cursor</span> <span class="o">+=</span> <span class="mi">2</span> <span class="k">continue</span> <span class="k">if</span> <span class="n">char</span> <span class="ow">in</span> <span class="p">[</span><span class="s1">&#39;%&#39;</span><span class="p">,</span> <span class="s1">&#39;}&#39;</span><span class="p">]:</span> <span class="c1"># Handle escaping % and }</span> <span class="k">if</span> <span class="n">getelement</span><span class="p">(</span><span class="n">source</span><span class="p">,</span> <span class="n">cursor</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="o">==</span> <span class="n">char</span><span class="p">:</span> <span class="n">cursor</span> <span class="o">+=</span> <span class="mi">1</span> <span class="k">continue</span> <span class="k">if</span> <span class="n">getelement</span><span class="p">(</span><span class="n">source</span><span class="p">,</span> <span class="n">cursor</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span> <span class="o">!=</span> <span class="s1">&#39;}&#39;</span><span class="p">:</span> <span class="n">cursor</span> <span class="o">+=</span> <span class="mi">1</span> <span class="k">continue</span> <span class="k">if</span> <span class="n">current</span><span class="p">:</span> <span class="n">tokens</span><span class="o">.</span><span class="n">append</span><span class="p">({</span> <span class="s1">&#39;value&#39;</span><span class="p">:</span> <span class="n">current</span><span class="p">,</span> <span class="s1">&#39;cursor&#39;</span><span class="p">:</span> <span class="n">cursor</span> <span class="o">-</span> <span class="nb">len</span><span class="p">(</span><span class="n">current</span><span class="p">),</span> <span class="p">})</span> <span class="n">current</span> <span class="o">=</span> <span class="s1">&#39;&#39;</span> <span class="n">tokens</span><span class="o">.</span><span class="n">append</span><span class="p">({</span> <span class="s1">&#39;value&#39;</span><span class="p">:</span> <span class="n">BLOCK_CLOSE</span> <span class="k">if</span> <span class="n">char</span> <span class="o">==</span> <span class="s1">&#39;%&#39;</span> <span class="k">else</span> <span class="n">TAG_CLOSE</span><span class="p">,</span> <span class="s1">&#39;cursor&#39;</span><span class="p">:</span> <span class="n">cursor</span><span class="p">,</span> <span class="p">})</span> <span class="n">cursor</span> <span class="o">+=</span> <span class="mi">2</span> <span class="k">continue</span> <span class="n">current</span> <span class="o">+=</span> <span class="n">getelement</span><span class="p">(</span><span class="n">source</span><span class="p">,</span> <span class="n">cursor</span><span class="p">)</span> <span class="n">cursor</span> <span class="o">+=</span> <span class="mi">1</span> <span class="k">if</span> <span class="n">current</span><span class="p">:</span> <span class="n">tokens</span><span class="o">.</span><span class="n">append</span><span class="p">({</span> <span class="s1">&#39;value&#39;</span><span class="p">:</span> <span class="n">current</span><span class="p">,</span> <span class="s1">&#39;cursor&#39;</span><span class="p">:</span> <span class="n">cursor</span> <span class="o">-</span> <span class="nb">len</span><span class="p">(</span><span class="n">current</span><span class="p">),</span> <span class="p">})</span> <span class="k">return</span> <span class="n">tokens</span> </pre></div> <p>That's it for lexing!</p> <h3 id="block,-tag-and-text-parsing">Block, tag and text parsing</h3><p>Next up is a matter of finding the ending/closing patterns in the array of tokens. There are a few main rules we'll look for:</p> <ul> <li>Every open tag symbol <code>{{</code> must be followed by a text token then a closing tag symbol <code>}}</code><ul> <li>The text within the open and close tag must parse into a valid expression (we'll define this logic later)</li> </ul> </li> <li>Every block symbol <code>{%</code> must be followed by a text token then an end of block symbol <code>%}</code><ul> <li>The text token within the open and close block must parse into a valid function call (we'll define this logic later)</li> </ul> </li> <li>Every block must have a matching end block where the text in the end block is <code>end</code> concatenated to the beginning of the function being called in the start block<ul> <li>The text between two blocks can contain nested blocks or tags</li> </ul> </li> </ul> <p>Let's codify that:</p> <p><span class="code-caption">pytemplate.py</span></p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">parse</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span> <span class="n">end_of_block_marker</span><span class="o">=</span><span class="kc">None</span><span class="p">):</span> <span class="n">cursor</span> <span class="o">=</span> <span class="mi">0</span> <span class="n">ast</span> <span class="o">=</span> <span class="p">[]</span> <span class="k">while</span> <span class="n">cursor</span> <span class="o">&lt;</span> <span class="nb">len</span><span class="p">(</span><span class="n">tokens</span><span class="p">):</span> <span class="n">t</span> <span class="o">=</span> <span class="n">getelement</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span> <span class="n">cursor</span><span class="p">)</span> <span class="n">value</span> <span class="o">=</span> <span class="n">t</span><span class="p">[</span><span class="s1">&#39;value&#39;</span><span class="p">]</span> <span class="k">if</span> <span class="n">value</span> <span class="o">==</span> <span class="n">TAG_OPEN</span><span class="p">:</span> <span class="k">if</span> <span class="n">getelement</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span> <span class="n">cursor</span><span class="o">+</span><span class="mi">2</span><span class="p">)[</span><span class="s1">&#39;value&#39;</span><span class="p">]</span> <span class="o">!=</span> <span class="n">TAG_CLOSE</span><span class="p">:</span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">&#39;Expected closing tag&#39;</span><span class="p">)</span> <span class="n">node_tokens</span> <span class="o">=</span> <span class="n">lex_node</span><span class="p">(</span><span class="n">getelement</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span> <span class="n">cursor</span><span class="o">+</span><span class="mi">1</span><span class="p">)[</span><span class="s1">&#39;value&#39;</span><span class="p">])</span> <span class="n">node_ast</span> <span class="o">=</span> <span class="n">parse_node</span><span class="p">(</span><span class="n">node_tokens</span><span class="p">)</span> <span class="n">ast</span><span class="o">.</span><span class="n">append</span><span class="p">({</span> <span class="s1">&#39;type&#39;</span><span class="p">:</span> <span class="s1">&#39;tag&#39;</span><span class="p">,</span> <span class="s1">&#39;value&#39;</span><span class="p">:</span> <span class="n">node_ast</span><span class="p">,</span> <span class="p">})</span> <span class="n">cursor</span> <span class="o">+=</span> <span class="mi">3</span> <span class="k">continue</span> <span class="k">if</span> <span class="n">value</span> <span class="o">==</span> <span class="n">TAG_CLOSE</span><span class="p">:</span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">&#39;Expected opening tag&#39;</span><span class="p">)</span> <span class="k">if</span> <span class="n">value</span> <span class="o">==</span> <span class="n">BLOCK_OPEN</span><span class="p">:</span> <span class="k">if</span> <span class="n">getelement</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span> <span class="n">cursor</span><span class="o">+</span><span class="mi">2</span><span class="p">)[</span><span class="s1">&#39;value&#39;</span><span class="p">]</span> <span class="o">!=</span> <span class="n">BLOCK_CLOSE</span><span class="p">:</span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">&#39;Expected end of block open&#39;</span><span class="p">)</span> <span class="n">block</span> <span class="o">=</span> <span class="n">getelement</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span> <span class="n">cursor</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span> <span class="n">node_tokens</span> <span class="o">=</span> <span class="n">lex_node</span><span class="p">(</span><span class="n">block</span><span class="p">[</span><span class="s1">&#39;value&#39;</span><span class="p">])</span> <span class="n">node_ast</span> <span class="o">=</span> <span class="n">parse_node</span><span class="p">(</span><span class="n">node_tokens</span><span class="p">)</span> <span class="k">if</span> <span class="n">end_of_block_marker</span> <span class="ow">and</span> <span class="s1">&#39;end&#39;</span><span class="o">+</span><span class="n">end_of_block_marker</span> <span class="o">==</span> <span class="n">node_ast</span><span class="p">[</span><span class="s1">&#39;value&#39;</span><span class="p">]:</span> <span class="k">return</span> <span class="n">ast</span><span class="p">,</span> <span class="n">cursor</span><span class="o">+</span><span class="mi">3</span> <span class="n">child</span><span class="p">,</span> <span class="n">cursor_offset</span> <span class="o">=</span> <span class="n">parse</span><span class="p">(</span><span class="n">tokens</span><span class="p">[</span><span class="n">cursor</span><span class="o">+</span><span class="mi">3</span><span class="p">:],</span> <span class="n">node_ast</span><span class="p">[</span><span class="s1">&#39;value&#39;</span><span class="p">])</span> <span class="k">if</span> <span class="n">cursor_offset</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">&#39;Failed to find end of block&#39;</span><span class="p">)</span> <span class="n">ast</span><span class="o">.</span><span class="n">append</span><span class="p">({</span> <span class="s1">&#39;type&#39;</span><span class="p">:</span> <span class="s1">&#39;block&#39;</span><span class="p">,</span> <span class="s1">&#39;value&#39;</span><span class="p">:</span> <span class="n">node_ast</span><span class="p">,</span> <span class="s1">&#39;child&#39;</span><span class="p">:</span> <span class="n">child</span><span class="p">,</span> <span class="p">})</span> <span class="n">cursor</span> <span class="o">+=</span> <span class="n">cursor_offset</span> <span class="o">+</span> <span class="mi">3</span> <span class="k">continue</span> <span class="k">if</span> <span class="n">value</span> <span class="o">==</span> <span class="n">BLOCK_CLOSE</span><span class="p">:</span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">&#39;Expected start of block open&#39;</span><span class="p">)</span> <span class="n">ast</span><span class="o">.</span><span class="n">append</span><span class="p">({</span> <span class="s1">&#39;type&#39;</span><span class="p">:</span> <span class="s1">&#39;text&#39;</span><span class="p">,</span> <span class="s1">&#39;value&#39;</span><span class="p">:</span> <span class="n">t</span><span class="p">,</span> <span class="p">})</span> <span class="n">cursor</span> <span class="o">+=</span> <span class="mi">1</span> <span class="k">return</span> <span class="n">ast</span><span class="p">,</span> <span class="n">cursor</span> </pre></div> <p>And that's it for parsing blocks and tags. Now we have to get into the node language.</p> <h3 id="node-lexing">Node lexing</h3><p>In the node language, everything is either a literal or a function call. Whitespace is ignored. The only special symbols in the node language are commas and parentheses.</p> <p>So to break the text into tokens we just iterate over all characters until we find whitespace or a symbol. Accumulate the characters that are not either. Add everything but whitespace to the list of tokens.</p> <p><span class="code-caption">pytemplate.py</span></p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">lex_node</span><span class="p">(</span><span class="n">source</span><span class="p">):</span> <span class="n">tokens</span> <span class="o">=</span> <span class="p">[]</span> <span class="n">cursor</span> <span class="o">=</span> <span class="mi">0</span> <span class="n">current</span> <span class="o">=</span> <span class="s1">&#39;&#39;</span> <span class="k">while</span> <span class="n">cursor</span> <span class="o">&lt;</span> <span class="nb">len</span><span class="p">(</span><span class="n">source</span><span class="p">):</span> <span class="n">char</span> <span class="o">=</span> <span class="n">getelement</span><span class="p">(</span><span class="n">source</span><span class="p">,</span> <span class="n">cursor</span><span class="p">)</span> <span class="k">if</span> <span class="n">char</span> <span class="ow">in</span> <span class="p">[</span><span class="s1">&#39;</span><span class="se">\r</span><span class="s1">&#39;</span><span class="p">,</span> <span class="s1">&#39;</span><span class="se">\t</span><span class="s1">&#39;</span><span class="p">,</span> <span class="s1">&#39;</span><span class="se">\n</span><span class="s1">&#39;</span><span class="p">,</span> <span class="s1">&#39; &#39;</span><span class="p">]:</span> <span class="k">if</span> <span class="n">current</span><span class="p">:</span> <span class="n">tokens</span><span class="o">.</span><span class="n">append</span><span class="p">({</span> <span class="s1">&#39;value&#39;</span><span class="p">:</span> <span class="n">current</span><span class="p">,</span> <span class="s1">&#39;type&#39;</span><span class="p">:</span> <span class="s1">&#39;literal&#39;</span><span class="p">,</span> <span class="p">})</span> <span class="n">current</span> <span class="o">=</span> <span class="s1">&#39;&#39;</span> <span class="n">cursor</span> <span class="o">+=</span> <span class="mi">1</span> <span class="k">continue</span> <span class="k">if</span> <span class="n">char</span> <span class="ow">in</span> <span class="p">[</span><span class="s1">&#39;(&#39;</span><span class="p">,</span> <span class="s1">&#39;)&#39;</span><span class="p">,</span> <span class="s1">&#39;,&#39;</span><span class="p">]:</span> <span class="k">if</span> <span class="n">current</span><span class="p">:</span> <span class="n">tokens</span><span class="o">.</span><span class="n">append</span><span class="p">({</span> <span class="s1">&#39;value&#39;</span><span class="p">:</span> <span class="n">current</span><span class="p">,</span> <span class="s1">&#39;type&#39;</span><span class="p">:</span> <span class="s1">&#39;literal&#39;</span><span class="p">,</span> <span class="p">})</span> <span class="n">current</span> <span class="o">=</span> <span class="s1">&#39;&#39;</span> <span class="n">tokens</span><span class="o">.</span><span class="n">append</span><span class="p">({</span> <span class="s1">&#39;value&#39;</span><span class="p">:</span> <span class="n">char</span><span class="p">,</span> <span class="s1">&#39;type&#39;</span><span class="p">:</span> <span class="s1">&#39;syntax&#39;</span><span class="p">,</span> <span class="p">})</span> <span class="n">cursor</span> <span class="o">+=</span> <span class="mi">1</span> <span class="k">continue</span> <span class="n">current</span> <span class="o">+=</span> <span class="n">char</span> <span class="n">cursor</span> <span class="o">+=</span><span class="mi">1</span> <span class="k">return</span> <span class="n">tokens</span> </pre></div> <p>And that's it for node lexing.</p> <h3 id="node-parsing">Node parsing</h3><p>We'll break this up into two functions. The first is just for parsing literals and function calls.</p> <p><span class="code-caption">pytemplate.py</span></p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">parse_node</span><span class="p">(</span><span class="n">tokens</span><span class="p">):</span> <span class="n">cursor</span> <span class="o">=</span> <span class="mi">0</span> <span class="n">ast</span> <span class="o">=</span> <span class="kc">None</span> <span class="k">while</span> <span class="n">cursor</span> <span class="o">&lt;</span> <span class="nb">len</span><span class="p">(</span><span class="n">tokens</span><span class="p">):</span> <span class="n">t</span> <span class="o">=</span> <span class="n">getelement</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span> <span class="n">cursor</span><span class="p">)</span> <span class="k">if</span> <span class="n">t</span><span class="p">[</span><span class="s1">&#39;type&#39;</span><span class="p">]</span> <span class="o">!=</span> <span class="s1">&#39;literal&#39;</span><span class="p">:</span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">&#39;Expected literal&#39;</span><span class="p">)</span> <span class="n">cursor</span> <span class="o">+=</span> <span class="mi">1</span> <span class="n">next_t</span> <span class="o">=</span> <span class="n">getelement</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span> <span class="n">cursor</span><span class="p">)</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">next_t</span><span class="p">:</span> <span class="n">ast</span> <span class="o">=</span> <span class="n">t</span> <span class="k">break</span> <span class="k">if</span> <span class="n">next_t</span><span class="p">[</span><span class="s1">&#39;value&#39;</span><span class="p">]</span> <span class="o">!=</span> <span class="s1">&#39;(&#39;</span><span class="p">:</span> <span class="n">ast</span> <span class="o">=</span> <span class="n">t</span> <span class="k">break</span> <span class="n">cursor</span> <span class="o">+=</span> <span class="mi">1</span> <span class="k">if</span> <span class="n">next_t</span><span class="p">[</span><span class="s1">&#39;value&#39;</span><span class="p">]</span> <span class="o">==</span> <span class="s1">&#39;(&#39;</span><span class="p">:</span> <span class="n">args</span><span class="p">,</span> <span class="n">cursor</span> <span class="o">=</span> <span class="n">parse_node_args</span><span class="p">(</span><span class="n">tokens</span><span class="p">[</span><span class="n">cursor</span><span class="p">:])</span> <span class="n">ast</span> <span class="o">=</span> <span class="p">{</span> <span class="s1">&#39;type&#39;</span><span class="p">:</span> <span class="s1">&#39;function&#39;</span><span class="p">,</span> <span class="s1">&#39;value&#39;</span><span class="p">:</span> <span class="n">t</span><span class="p">[</span><span class="s1">&#39;value&#39;</span><span class="p">]</span><span class="o">.</span><span class="n">strip</span><span class="p">(),</span> <span class="s1">&#39;args&#39;</span><span class="p">:</span> <span class="n">args</span><span class="p">,</span> <span class="p">}</span> <span class="n">cursor</span> <span class="o">+=</span> <span class="mi">2</span> <span class="k">break</span> <span class="k">if</span> <span class="n">cursor</span> <span class="o">!=</span> <span class="nb">len</span><span class="p">(</span><span class="n">tokens</span><span class="p">):</span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">&#39;Failed to parse node: &#39;</span> <span class="o">+</span> <span class="n">tokens</span><span class="p">[</span><span class="n">cursor</span><span class="p">][</span><span class="s1">&#39;value&#39;</span><span class="p">])</span> <span class="k">return</span> <span class="n">ast</span> </pre></div> <p>The second is for parsing function call arguments.</p> <p><span class="code-caption">pytemplate.py</span></p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">parse_node_args</span><span class="p">(</span><span class="n">tokens</span><span class="p">):</span> <span class="n">args</span> <span class="o">=</span> <span class="p">[]</span> <span class="n">cursor</span> <span class="o">=</span> <span class="mi">0</span> <span class="k">while</span> <span class="n">cursor</span> <span class="o">&lt;</span> <span class="nb">len</span><span class="p">(</span><span class="n">tokens</span><span class="p">):</span> <span class="n">t</span> <span class="o">=</span> <span class="n">getelement</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span> <span class="n">cursor</span><span class="p">)</span> <span class="k">if</span> <span class="n">t</span><span class="p">[</span><span class="s1">&#39;value&#39;</span><span class="p">]</span> <span class="o">==</span> <span class="s1">&#39;)&#39;</span><span class="p">:</span> <span class="k">return</span> <span class="n">args</span><span class="p">,</span> <span class="n">cursor</span> <span class="o">+</span> <span class="mi">1</span> <span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">args</span><span class="p">)</span> <span class="ow">and</span> <span class="n">t</span><span class="p">[</span><span class="s1">&#39;value&#39;</span><span class="p">]</span> <span class="o">==</span> <span class="s1">&#39;,&#39;</span><span class="p">:</span> <span class="n">cursor</span> <span class="o">+=</span> <span class="mi">1</span> <span class="k">elif</span> <span class="nb">len</span><span class="p">(</span><span class="n">args</span><span class="p">)</span> <span class="ow">and</span> <span class="n">t</span><span class="p">[</span><span class="s1">&#39;value&#39;</span><span class="p">]</span> <span class="o">!=</span> <span class="s1">&#39;,&#39;</span><span class="p">:</span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">&#39;Expected comma to separate args&#39;</span><span class="p">)</span> <span class="n">args</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">getelement</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span> <span class="n">cursor</span><span class="p">))</span> <span class="n">cursor</span> <span class="o">+=</span> <span class="mi">1</span> <span class="k">return</span> <span class="n">args</span><span class="p">,</span> <span class="n">cursor</span> </pre></div> <p>And that's it for parsing and lexing the entire whole template and node language!</p> <h3 id="interpreting">Interpreting</h3><p>Interpreting is a matter of iterating over the AST recursively, writing out literal text, evaluating the contents of tags, and doing special processing for blocks.</p> <p><span class="code-caption">pytemplate.py</span></p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">interpret</span><span class="p">(</span><span class="n">outfd</span><span class="p">,</span> <span class="n">ast</span><span class="p">,</span> <span class="n">env</span><span class="p">):</span> <span class="k">for</span> <span class="n">item</span> <span class="ow">in</span> <span class="n">ast</span><span class="p">:</span> <span class="n">item_type</span> <span class="o">=</span> <span class="n">item</span><span class="p">[</span><span class="s1">&#39;type&#39;</span><span class="p">]</span> <span class="n">node</span> <span class="o">=</span> <span class="n">item</span><span class="p">[</span><span class="s1">&#39;value&#39;</span><span class="p">]</span> <span class="k">if</span> <span class="n">item_type</span> <span class="o">==</span> <span class="s1">&#39;text&#39;</span><span class="p">:</span> <span class="n">outfd</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">node</span><span class="p">[</span><span class="s1">&#39;value&#39;</span><span class="p">])</span> <span class="k">continue</span> <span class="k">if</span> <span class="n">item_type</span> <span class="o">==</span> <span class="s1">&#39;tag&#39;</span><span class="p">:</span> <span class="n">tag_value</span> <span class="o">=</span> <span class="n">interpret_node</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="n">env</span><span class="p">)</span> <span class="n">outfd</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">tag_value</span><span class="p">)</span> <span class="k">continue</span> <span class="k">if</span> <span class="n">item_type</span> <span class="o">==</span> <span class="s1">&#39;block&#39;</span><span class="p">:</span> <span class="n">interpret_block</span><span class="p">(</span><span class="n">outfd</span><span class="p">,</span> <span class="n">node</span><span class="p">,</span> <span class="n">item</span><span class="p">[</span><span class="s1">&#39;child&#39;</span><span class="p">],</span> <span class="n">env</span><span class="p">)</span> <span class="k">continue</span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">&#39;Unknown type: &#39;</span> <span class="o">+</span> <span class="n">item_type</span><span class="p">)</span> </pre></div> <h4 id="intepreting-nodes">Intepreting nodes</h4><p>A node is one of two things:</p> <ul> <li>A literal which is either a<ul> <li>String if surrounded by single quotes</li> <li>Otherwise an identifier to be looked up in the environment dictionary</li> </ul> </li> <li>Or a function call</li> </ul> <p><span class="code-caption">pytemplate.py</span></p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">interpret_node</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="n">env</span><span class="p">):</span> <span class="k">if</span> <span class="n">node</span><span class="p">[</span><span class="s1">&#39;type&#39;</span><span class="p">]</span> <span class="o">==</span> <span class="s1">&#39;literal&#39;</span><span class="p">:</span> <span class="c1"># Is a string</span> <span class="k">if</span> <span class="n">node</span><span class="p">[</span><span class="s1">&#39;value&#39;</span><span class="p">][</span><span class="mi">0</span><span class="p">]</span> <span class="o">==</span> <span class="s2">&quot;&#39;&quot;</span> <span class="ow">and</span> <span class="n">node</span><span class="p">[</span><span class="s1">&#39;value&#39;</span><span class="p">][</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="o">==</span> <span class="s2">&quot;&#39;&quot;</span><span class="p">:</span> <span class="k">return</span> <span class="n">node</span><span class="p">[</span><span class="s1">&#39;value&#39;</span><span class="p">][</span><span class="mi">1</span><span class="p">:</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="c1"># Default to an env lookup</span> <span class="k">return</span> <span class="n">env</span><span class="p">[</span><span class="n">node</span><span class="p">[</span><span class="s1">&#39;value&#39;</span><span class="p">]]</span> <span class="n">function</span> <span class="o">=</span> <span class="n">node</span><span class="p">[</span><span class="s1">&#39;value&#39;</span><span class="p">]</span> <span class="n">args</span> <span class="o">=</span> <span class="n">node</span><span class="p">[</span><span class="s1">&#39;args&#39;</span><span class="p">]</span> </pre></div> <p>Let's define <code>==</code> which checks if all args are equal. First we have to interpret all args and then we return True if they are all equal.</p> <p><span class="code-caption">pytemplate.py</span></p> <div class="highlight"><pre><span></span> <span class="k">if</span> <span class="n">function</span> <span class="o">==</span> <span class="s1">&#39;==&#39;</span><span class="p">:</span> <span class="n">arg_vals</span> <span class="o">=</span> <span class="p">[</span><span class="n">interpret_node</span><span class="p">(</span><span class="n">arg</span><span class="p">,</span> <span class="n">env</span><span class="p">)</span> <span class="k">for</span> <span class="n">arg</span> <span class="ow">in</span> <span class="n">args</span><span class="p">]</span> <span class="k">if</span> <span class="n">arg_vals</span><span class="o">.</span><span class="n">count</span><span class="p">(</span><span class="n">arg_vals</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span> <span class="o">==</span> <span class="nb">len</span><span class="p">(</span><span class="n">arg_vals</span><span class="p">):</span> <span class="k">return</span> <span class="kc">True</span> <span class="k">return</span> <span class="kc">False</span> </pre></div> <p>Now let's define a helper for retrieving an entry from a dictionary, called <code>get</code>. This will evaluate its first arg and assume it is a dictionary. Then it will evaluate its second arg and assume it is a key in the dictionary. Then it will return the result of looking up the key in the dictionary.</p> <p><span class="code-caption">pytemplate.py</span></p> <div class="highlight"><pre><span></span> <span class="k">if</span> <span class="n">function</span> <span class="o">==</span> <span class="s1">&#39;get&#39;</span><span class="p">:</span> <span class="n">arg_vals</span> <span class="o">=</span> <span class="p">[</span><span class="n">interpret_node</span><span class="p">(</span><span class="n">arg</span><span class="p">,</span> <span class="n">env</span><span class="p">)</span> <span class="k">for</span> <span class="n">arg</span> <span class="ow">in</span> <span class="n">args</span><span class="p">]</span> <span class="k">return</span> <span class="n">arg_vals</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="n">arg_vals</span><span class="p">[</span><span class="mi">1</span><span class="p">]]</span> </pre></div> <p>And if its neither of these supported functions, just raise an error.</p> <p><span class="code-caption">pytemplate.py</span></p> <div class="highlight"><pre><span></span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">&#39;Unknown function: &#39;</span> <span class="o">+</span> <span class="n">function</span><span class="p">)</span> </pre></div> <h4 id="interpreting-blocks">Interpreting blocks</h4><p>Blocks are just a little different than a generic node. In addition to being evaluated they act on a child AST within the start and end of the block.</p> <p>For example, in an <code>if</code> block we will evaluate its argument and recursively call <code>interpret</code> on the child AST if the argument is truthy.</p> <p><span class="code-caption">pytemplate.py</span></p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">interpret_block</span><span class="p">(</span><span class="n">outfd</span><span class="p">,</span> <span class="n">node</span><span class="p">,</span> <span class="n">child</span><span class="p">,</span> <span class="n">env</span><span class="p">):</span> <span class="n">function</span> <span class="o">=</span> <span class="n">node</span><span class="p">[</span><span class="s1">&#39;value&#39;</span><span class="p">]</span> <span class="n">args</span> <span class="o">=</span> <span class="n">node</span><span class="p">[</span><span class="s1">&#39;args&#39;</span><span class="p">]</span> <span class="k">if</span> <span class="n">function</span> <span class="o">==</span> <span class="s1">&#39;if&#39;</span> <span class="ow">and</span> <span class="n">interpret_node</span><span class="p">(</span><span class="n">node</span><span class="p">,</span> <span class="n">env</span><span class="p">):</span> <span class="n">interpret</span><span class="p">(</span><span class="n">outfd</span><span class="p">,</span> <span class="n">child</span><span class="p">,</span> <span class="n">env</span><span class="p">)</span> <span class="k">return</span> </pre></div> <p>And for <code>for-in</code> we will use the first argument as the name of an identifier to be copied into a child environment dictionary. We'll interpret the second argument and then iterate over it, calling <code>interpret</code> recursively for each item in the array and passing the child environment dictionary so it has access to the current element.</p> <p><span class="code-caption">pytemplate.py</span></p> <div class="highlight"><pre><span></span> <span class="k">if</span> <span class="n">function</span> <span class="o">==</span> <span class="s1">&#39;for-in&#39;</span><span class="p">:</span> <span class="n">loop_variable</span> <span class="o">=</span> <span class="n">args</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="n">loop_iter_variable</span> <span class="o">=</span> <span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="s1">&#39;value&#39;</span><span class="p">]</span> <span class="k">for</span> <span class="n">elem</span> <span class="ow">in</span> <span class="n">interpret_node</span><span class="p">(</span><span class="n">loop_variable</span><span class="p">,</span> <span class="n">env</span><span class="p">):</span> <span class="n">child_env</span> <span class="o">=</span> <span class="n">env</span><span class="o">.</span><span class="n">copy</span><span class="p">()</span> <span class="n">child_env</span><span class="p">[</span><span class="n">loop_iter_variable</span><span class="p">]</span> <span class="o">=</span> <span class="n">elem</span> <span class="n">interpret</span><span class="p">(</span><span class="n">outfd</span><span class="p">,</span> <span class="n">child</span><span class="p">,</span> <span class="n">child_env</span><span class="p">)</span> <span class="k">return</span> </pre></div> <p>Just like before, if we see a block we don't support yet, throw an error.</p> <p><span class="code-caption">pytemplate.py</span></p> <div class="highlight"><pre><span></span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">&#39;Unsupported block node function: &#39;</span> <span class="o">+</span> <span class="n">function</span><span class="p">)</span> </pre></div> <p>And that's that. :)</p> <h3 id="run-it">Run it</h3><p>Now we can give the example from the beginning a shot.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>python3<span class="w"> </span>test.py &lt;html&gt; <span class="w"> </span>&lt;body&gt; <span class="w"> </span>&lt;article&gt; <span class="w"> </span>&lt;h1&gt;Hello<span class="w"> </span>world!&lt;/h1&gt; <span class="w"> </span>&lt;p&gt; <span class="w"> </span>This<span class="w"> </span>is<span class="w"> </span>my<span class="w"> </span>first<span class="w"> </span>post! <span class="w"> </span>&lt;/p&gt; <span class="w"> </span>&lt;/article&gt; <span class="w"> </span>&lt;article&gt; <span class="w"> </span>&lt;h1&gt;Take<span class="w"> </span>two&lt;/h1&gt; <span class="w"> </span>&lt;p&gt; <span class="w"> </span>This<span class="w"> </span>is<span class="w"> </span>a<span class="w"> </span>second<span class="w"> </span>post. <span class="w"> </span>&lt;/p&gt; <span class="w"> </span>&lt;/article&gt; <span class="w"> </span>&lt;/body&gt; &lt;/html&gt; </pre></div> <p>Pretty sweet for only 300 lines of Python!</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Been wanting to write a Python template library ever since I failed trying to do so years ago in Standard ML. Here&#39;s my take on a Jinja-like library!<a href="https://t.co/P1nAV6fSxk">https://t.co/P1nAV6fSxk</a> <a href="https://t.co/DbXQt1JYx8">pic.twitter.com/DbXQt1JYx8</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1396535283190046722?ref_src=twsrc%5Etfw">May 23, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/writing-a-template-library-in-python.htmlSun, 23 May 2021 00:00:00 +0000Learning a new codebase: hacking on nginxhttp://notes.eatonphil.com/learning-a-new-codebase-hacking-nginx.html<p>I have never contributed to nginx. My C skills are 1/10. But downloading the source, hacking it up, compiling it, and running it doesn't scare me. This post is to help you overcome your own fears about doing so. Not necessarily because you should be running out-of-tree diffs in production but because I see a lot of developers never even consider looking at the source of a big tool or dependency they use.</p> <p>Most of all, studying mature software projects is one of the best ways to grow as a programmer.</p> <h3 id="source-and-build">Source and build</h3><p>At a high-level, the steps for hacking on software projects are always the same:</p> <ol> <li>Find/download the source code</li> <li>Install necessary dependency libraries/compilers</li> <li>Start grepping around based on something you see in the output or capabilities you know exist</li> <li>Make a change</li> <li>Run some variation of <code>./configure && make</code> to build</li> <li>Run the program</li> <li>Go back to step 4 until you're happy</li> </ol> <h3 id="nginx">nginx</h3><p>Let's follow these steps for nginx. We google <code>nginx github</code> to learn that there's a read-only copy of the source on <a href="https://github.com/nginx/nginx">Github</a>.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>mkdir<span class="w"> </span>~/vendor $<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>~/vendor $<span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https://github.com/nginx/nginx $<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>nginx </pre></div> <p>There's no readme, bummer. We google <code>nginx build from source</code> and find <a href="http://nginx.org/en/docs/configure.html">this</a>. We see it's a typical C project that builds exactly as guessed: <code>./configure && make</code>. And it doesn't look like it has any third-party dependencies besides my C compiler.</p> <p>Install autoconf, gmake, and a C compiler. There's no <code>./configure</code> file in this directory but notice there is a <code>configure</code> file in <code>auto</code>. Trying <code>cd auto &amp;&amp; ./configure</code> crashes so let's try <code>./auto/configure</code>. That seems to do it except for the warning:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>./auto/configure ... ./auto/configure:<span class="w"> </span>error:<span class="w"> </span>the<span class="w"> </span>HTTP<span class="w"> </span>rewrite<span class="w"> </span>module<span class="w"> </span>requires<span class="w"> </span>the<span class="w"> </span>PCRE<span class="w"> </span>library. You<span class="w"> </span>can<span class="w"> </span>either<span class="w"> </span>disable<span class="w"> </span>the<span class="w"> </span>module<span class="w"> </span>by<span class="w"> </span>using<span class="w"> </span>--without-http_rewrite_module option,<span class="w"> </span>or<span class="w"> </span>install<span class="w"> </span>the<span class="w"> </span>PCRE<span class="w"> </span>library<span class="w"> </span>into<span class="w"> </span>the<span class="w"> </span>system,<span class="w"> </span>or<span class="w"> </span>build<span class="w"> </span>the<span class="w"> </span>PCRE<span class="w"> </span>library statically<span class="w"> </span>from<span class="w"> </span>the<span class="w"> </span><span class="nb">source</span><span class="w"> </span>with<span class="w"> </span>nginx<span class="w"> </span>by<span class="w"> </span>using<span class="w"> </span>--with-pcre<span class="o">=</span>&lt;path&gt;<span class="w"> </span>option. </pre></div> <p>Run <code>./auto/configure --without-http_rewrite_module</code>. And then again when that fails but also omitting <code>http_gzip_module</code>.</p> <p>Ok autoconfigure is done. Now we've got a Makefile. Run <code>make -j</code> to compile using all cores.</p> <p>Run <code>git status</code> to see where the binary was placed. Run <code>ls objs</code> and there it is, great:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>ls<span class="w"> </span>objs autoconf.err<span class="w"> </span>nginx<span class="w"> </span>ngx_auto_config.h<span class="w"> </span>ngx_modules.c<span class="w"> </span>src Makefile<span class="w"> </span>nginx.8<span class="w"> </span>ngx_auto_headers.h<span class="w"> </span>ngx_modules.o </pre></div> <h3 id="the-hack">The hack</h3><p>We want a simple <code>dump</code> command that will return a literal string in a <code>location</code> block. So something like this:</p> <div class="highlight"><pre><span></span>$ diff --git a/conf/nginx.conf b/conf/nginx.conf <span class="gh">index 29bc085f..e96e817f 100644</span> <span class="gd">--- a/conf/nginx.conf</span> <span class="gi">+++ b/conf/nginx.conf</span> <span class="gu">@@ -41,8 +41,7 @@ http {</span> #access_log logs/host.access.log main; location / { <span class="gd">- root html;</span> <span class="gd">- index index.html index.htm;</span> <span class="gi">+ dump &#39;It was a good Thursday.&#39;;</span> <span class="w"> </span> } <span class="w"> </span> #error_page 404 /404.html; } </pre></div> <p>Now that we've built nginx we can use the <code>-t</code> flag to test the validity of this config:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>./objs/nginx<span class="w"> </span>-t<span class="w"> </span>-c<span class="w"> </span><span class="k">$(</span><span class="nb">pwd</span><span class="k">)</span>/conf/nginx.conf nginx:<span class="w"> </span><span class="o">[</span>alert<span class="o">]</span><span class="w"> </span>could<span class="w"> </span>not<span class="w"> </span>open<span class="w"> </span>error<span class="w"> </span>log<span class="w"> </span>file:<span class="w"> </span>open<span class="o">()</span><span class="w"> </span><span class="s2">&quot;/usr/local/nginx/logs/error.log&quot;</span><span class="w"> </span>failed<span class="w"> </span><span class="o">(</span><span class="m">2</span>:<span class="w"> </span>No<span class="w"> </span>such<span class="w"> </span>file<span class="w"> </span>or<span class="w"> </span>directory<span class="o">)</span> <span class="m">2021</span>/04/04<span class="w"> </span><span class="m">21</span>:24:09<span class="w"> </span><span class="o">[</span>emerg<span class="o">]</span><span class="w"> </span><span class="m">1030951</span><span class="c1">#0: unknown directive &quot;dump&quot; in /home/phil/vendor/nginx/conf/nginx.conf:44</span> nginx:<span class="w"> </span>configuration<span class="w"> </span>file<span class="w"> </span>/home/phil/vendor/nginx/conf/nginx.conf<span class="w"> </span><span class="nb">test</span><span class="w"> </span>failed </pre></div> <p>And now we've got something to go on! Clearly we have to register this directive and the log gives us enough info to start grepping:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>git<span class="w"> </span>--no-pager<span class="w"> </span>grep<span class="w"> </span><span class="s1">&#39;unknown directive&#39;</span> src/core/ngx_conf_file.c:<span class="w"> </span><span class="s2">&quot;unknown directive \&quot;%s\&quot;&quot;</span>,<span class="w"> </span>name-&gt;data<span class="o">)</span><span class="p">;</span> </pre></div> <p>The case that has this failing comes from line 463: <code>rv = cmd-&gt;set(cf, cmd, conf)</code>. So let's see what this <code>set</code> does. <code>git grep set</code> is useless. Let's try finding out what <code>cmd</code> is so we can locate the struct that has <code>set</code> on it. Ah it's an <code>ngx_command_t</code>. Since it doesn't have <code>struct</code> behind it it means it's typedef-ed and will likely have a <code>;</code> after it. So <code>git grep ngx_command_t\;</code> finds us:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>git<span class="w"> </span>--no-pager<span class="w"> </span>grep<span class="w"> </span>ngx_command_t<span class="se">\;</span> src/core/ngx_core.h:typedef<span class="w"> </span>struct<span class="w"> </span>ngx_command_s<span class="w"> </span>ngx_command_t<span class="p">;</span> </pre></div> <p>Which means the implementation is hidden, so grep for ngx_command_s:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>git<span class="w"> </span>--no-pager<span class="w"> </span>grep<span class="w"> </span>ngx_command_s src/core/ngx_conf_file.h:struct<span class="w"> </span>ngx_command_s<span class="w"> </span><span class="o">{</span> src/core/ngx_core.h:typedef<span class="w"> </span>struct<span class="w"> </span>ngx_command_s<span class="w"> </span>ngx_command_t<span class="p">;</span> </pre></div> <p>Ok this is going nowhere. Different approach. What command did we remove?</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>git<span class="w"> </span>--no-pager<span class="w"> </span>diff diff<span class="w"> </span>--git<span class="w"> </span>a/conf/nginx.conf<span class="w"> </span>b/conf/nginx.conf index<span class="w"> </span>29bc085f..e96e817f<span class="w"> </span><span class="m">100644</span> ---<span class="w"> </span>a/conf/nginx.conf +++<span class="w"> </span>b/conf/nginx.conf @@<span class="w"> </span>-41,8<span class="w"> </span>+41,7<span class="w"> </span>@@<span class="w"> </span>http<span class="w"> </span><span class="o">{</span> <span class="w"> </span><span class="c1">#access_log logs/host.access.log main;</span> <span class="w"> </span>location<span class="w"> </span>/<span class="w"> </span><span class="o">{</span> -<span class="w"> </span>root<span class="w"> </span>html<span class="p">;</span> -<span class="w"> </span>index<span class="w"> </span>index.html<span class="w"> </span>index.htm<span class="p">;</span> +<span class="w"> </span>dump<span class="w"> </span><span class="s1">&#39;It was a good Thursday.&#39;</span><span class="p">;</span> <span class="w"> </span><span class="o">}</span> <span class="w"> </span><span class="c1">#error_page 404 /404.html;</span> </pre></div> <p><code>root</code> is a command. Maybe we can copy that.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>git<span class="w"> </span>--no-pager<span class="w"> </span>grep<span class="w"> </span><span class="se">\&quot;</span>root<span class="se">\&quot;</span> docs/xml/nginx/changes.xml:in<span class="w"> </span>the<span class="w"> </span><span class="s2">&quot;root&quot;</span><span class="w"> </span>or<span class="w"> </span><span class="s2">&quot;auth_basic_user_file&quot;</span><span class="w"> </span>directives. docs/xml/nginx/changes.xml:a<span class="w"> </span>request<span class="w"> </span>was<span class="w"> </span>handled<span class="w"> </span>incorrectly,<span class="w"> </span><span class="k">if</span><span class="w"> </span>a<span class="w"> </span><span class="s2">&quot;root&quot;</span><span class="w"> </span>directive<span class="w"> </span>used<span class="w"> </span>variables<span class="p">;</span> docs/xml/nginx/changes.xml:the<span class="w"> </span><span class="nv">$document_root</span><span class="w"> </span>variable<span class="w"> </span>usage<span class="w"> </span><span class="k">in</span><span class="w"> </span>the<span class="w"> </span><span class="s2">&quot;root&quot;</span><span class="w"> </span>and<span class="w"> </span><span class="s2">&quot;alias&quot;</span><span class="w"> </span>directives docs/xml/nginx/changes.xml:the<span class="w"> </span><span class="nv">$document_root</span><span class="w"> </span>variable<span class="w"> </span>did<span class="w"> </span>not<span class="w"> </span>support<span class="w"> </span>the<span class="w"> </span>variables<span class="w"> </span><span class="k">in</span><span class="w"> </span>the<span class="w"> </span><span class="s2">&quot;root&quot;</span> docs/xml/nginx/changes.xml:if<span class="w"> </span>a<span class="w"> </span><span class="s2">&quot;root&quot;</span><span class="w"> </span>was<span class="w"> </span>specified<span class="w"> </span>by<span class="w"> </span>variable<span class="w"> </span>only,<span class="w"> </span><span class="k">then</span><span class="w"> </span>the<span class="w"> </span>root<span class="w"> </span>was<span class="w"> </span>relative src/http/ngx_http_core_module.c:<span class="w"> </span><span class="o">{</span><span class="w"> </span>ngx_string<span class="o">(</span><span class="s2">&quot;root&quot;</span><span class="o">)</span>, src/http/ngx_http_core_module.c:<span class="w"> </span><span class="p">&amp;</span>cmd-&gt;name,<span class="w"> </span>clcf-&gt;alias<span class="w"> </span>?<span class="w"> </span><span class="s2">&quot;alias&quot;</span><span class="w"> </span>:<span class="w"> </span><span class="s2">&quot;root&quot;</span><span class="o">)</span><span class="p">;</span> </pre></div> <p>That looks more promising. Let's copy that:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>git<span class="w"> </span>--no-pager<span class="w"> </span>diff<span class="w"> </span>src/http/ diff<span class="w"> </span>--git<span class="w"> </span>a/src/http/ngx_http_core_module.c<span class="w"> </span>b/src/http/ngx_http_core_module.c<span class="w"> </span>index<span class="w"> </span>9b94b328..17a64e80<span class="w"> </span><span class="m">100644</span><span class="w"> </span>---<span class="w"> </span>a/src/http/ngx_http_core_module.c<span class="w"> </span>+++<span class="w"> </span>b/src/http/ngx_http_core_module.c<span class="w"> </span>@@<span class="w"> </span>-331,6<span class="w"> </span>+331,14<span class="w"> </span>@@<span class="w"> </span>static<span class="w"> </span>ngx_command_t<span class="w"> </span>ngx_http_core_commands<span class="o">[]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">{</span> <span class="w"> </span><span class="m">0</span>, <span class="w"> </span>NULL<span class="w"> </span><span class="o">}</span>, +<span class="w"> </span><span class="o">{</span><span class="w"> </span>ngx_string<span class="o">(</span><span class="s2">&quot;dump&quot;</span><span class="o">)</span>, +<span class="w"> </span>NGX_HTTP_MAIN_CONF<span class="p">|</span>NGX_HTTP_SRV_CONF<span class="p">|</span>NGX_HTTP_LOC_CONF<span class="p">|</span>NGX_HTTP_LIF_CONF +<span class="w"> </span><span class="p">|</span>NGX_CONF_TAKE1, +<span class="w"> </span>ngx_http_core_dump, +<span class="w"> </span>NGX_HTTP_LOC_CONF_OFFSET, +<span class="w"> </span><span class="m">0</span>, +<span class="w"> </span>NULL<span class="w"> </span><span class="o">}</span>, + <span class="w"> </span><span class="o">{</span><span class="w"> </span>ngx_string<span class="o">(</span><span class="s2">&quot;alias&quot;</span><span class="o">)</span>, <span class="w"> </span>NGX_HTTP_LOC_CONF<span class="p">|</span>NGX_CONF_TAKE1, <span class="w"> </span>ngx_http_core_root, </pre></div> <p>Ok so this is how a command is registered. It obviously won't build without <code>ngx_http_core_dump</code> so let's implement that by copying/renaming <code>ngx_http_core_root</code>:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>git<span class="w"> </span>--no-pager<span class="w"> </span>diff<span class="w"> </span>src diff<span class="w"> </span>--git<span class="w"> </span>a/src/http/ngx_http_core_module.c<span class="w"> </span>b/src/http/ngx_http_core_module.c index<span class="w"> </span>9b94b328..c184dab5<span class="w"> </span><span class="m">100644</span> ---<span class="w"> </span>a/src/http/ngx_http_core_module.c +++<span class="w"> </span>b/src/http/ngx_http_core_module.c @@<span class="w"> </span>-4402,6<span class="w"> </span>+4410,16<span class="w"> </span>@@<span class="w"> </span>ngx_http_core_root<span class="o">(</span>ngx_conf_t<span class="w"> </span>*cf,<span class="w"> </span>ngx_command_t<span class="w"> </span>*cmd,<span class="w"> </span>void<span class="w"> </span>*conf<span class="o">)</span> <span class="o">}</span> +static<span class="w"> </span>char<span class="w"> </span>* +ngx_http_core_dump<span class="o">(</span>ngx_conf_t<span class="w"> </span>*cf,<span class="w"> </span>ngx_command_t<span class="w"> </span>*cmd,<span class="w"> </span>void<span class="w"> </span>*conf<span class="o">)</span> +<span class="o">{</span> +<span class="w"> </span>ngx_http_core_loc_conf_t<span class="w"> </span>*clcf<span class="w"> </span><span class="o">=</span><span class="w"> </span>conf<span class="p">;</span> +<span class="w"> </span>ngx_str_t<span class="w"> </span>*value<span class="w"> </span><span class="o">=</span><span class="w"> </span>cf-&gt;args-&gt;elts<span class="p">;</span> +<span class="w"> </span>clcf-&gt;dump<span class="w"> </span><span class="o">=</span><span class="w"> </span>value<span class="o">[</span><span class="m">1</span><span class="o">]</span><span class="p">;</span> +<span class="w"> </span><span class="k">return</span><span class="w"> </span>NGX_CONF_OK<span class="p">;</span> +<span class="o">}</span> + + static<span class="w"> </span>ngx_http_method_name_t<span class="w"> </span>ngx_methods_names<span class="o">[]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">{</span> <span class="w"> </span><span class="o">{</span><span class="w"> </span><span class="o">(</span>u_char<span class="w"> </span>*<span class="o">)</span><span class="w"> </span><span class="s2">&quot;GET&quot;</span>,<span class="w"> </span><span class="o">(</span>uint32_t<span class="o">)</span><span class="w"> </span>~NGX_HTTP_GET<span class="w"> </span><span class="o">}</span>, <span class="w"> </span><span class="o">{</span><span class="w"> </span><span class="o">(</span>u_char<span class="w"> </span>*<span class="o">)</span><span class="w"> </span><span class="s2">&quot;HEAD&quot;</span>,<span class="w"> </span><span class="o">(</span>uint32_t<span class="o">)</span><span class="w"> </span>~NGX_HTTP_HEAD<span class="w"> </span><span class="o">}</span>, </pre></div> <p>The goal here is to just store the dump string on this conf object. Then while serving the request we can check if this is set and if so, respond to the request with this string.</p> <p>This still clearly won't build because we didn't modify this conf object. But let's run <code>make</code> anyway.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>make<span class="w"> </span>-f<span class="w"> </span>objs/Makefile make<span class="o">[</span><span class="m">1</span><span class="o">]</span>:<span class="w"> </span>Entering<span class="w"> </span>directory<span class="w"> </span><span class="s1">&#39;/home/phil/vendor/nginx&#39;</span> cc<span class="w"> </span>-c<span class="w"> </span>-pipe<span class="w"> </span>-O<span class="w"> </span>-W<span class="w"> </span>-Wall<span class="w"> </span>-Wpointer-arith<span class="w"> </span>-Wno-unused-parameter<span class="w"> </span>-Werror<span class="w"> </span>-g<span class="w"> </span>-I<span class="w"> </span>src/core<span class="w"> </span>-I<span class="w"> </span>src/event<span class="w"> </span>-I<span class="w"> </span>src/event/modules<span class="w"> </span>-I<span class="w"> </span>src/os/unix<span class="w"> </span>-I<span class="w"> </span>objs<span class="w"> </span>-I<span class="w"> </span>src/http<span class="w"> </span>-I<span class="w"> </span>src/http/modules<span class="w"> </span><span class="se">\</span> <span class="w"> </span>-o<span class="w"> </span>objs/src/http/ngx_http_core_module.o<span class="w"> </span><span class="se">\</span> <span class="w"> </span>src/http/ngx_http_core_module.c src/http/ngx_http_core_module.c:337:7:<span class="w"> </span>error:<span class="w"> </span>ngx_http_core_dump<span class="w"> </span>undeclared<span class="w"> </span>here<span class="w"> </span><span class="o">(</span>not<span class="w"> </span><span class="k">in</span><span class="w"> </span>a<span class="w"> </span><span class="k">function</span><span class="o">)</span><span class="p">;</span><span class="w"> </span>did<span class="w"> </span>you<span class="w"> </span>mean<span class="w"> </span>ngx_http_core_type? <span class="m">337</span><span class="w"> </span><span class="p">|</span><span class="w"> </span>ngx_http_core_dump, <span class="w"> </span><span class="p">|</span><span class="w"> </span>^~~~~~~~~~~~~~~~~~~~~ <span class="w"> </span><span class="p">|</span><span class="w"> </span>ngx_http_core_type src/http/ngx_http_core_module.c:<span class="w"> </span>In<span class="w"> </span><span class="k">function</span><span class="w"> </span>ngx_http_core_dump: src/http/ngx_http_core_module.c:4418:9:<span class="w"> </span>error:<span class="w"> </span>ngx_http_core_loc_conf_t<span class="w"> </span><span class="o">{</span>aka<span class="w"> </span>struct<span class="w"> </span>ngx_http_core_loc_conf_s<span class="o">}</span><span class="w"> </span>has<span class="w"> </span>no<span class="w"> </span>member<span class="w"> </span>named<span class="w"> </span>dump <span class="m">4418</span><span class="w"> </span><span class="p">|</span><span class="w"> </span>clcf-&gt;dump<span class="w"> </span><span class="o">=</span><span class="w"> </span>value<span class="o">[</span><span class="m">1</span><span class="o">]</span><span class="p">;</span> <span class="w"> </span><span class="p">|</span><span class="w"> </span>^~ src/http/ngx_http_core_module.c:4418:5:<span class="w"> </span>error:<span class="w"> </span>statement<span class="w"> </span>with<span class="w"> </span>no<span class="w"> </span>effect<span class="w"> </span><span class="o">[</span>-Werror<span class="o">=</span>unused-value<span class="o">]</span> <span class="m">4418</span><span class="w"> </span><span class="p">|</span><span class="w"> </span>clcf-&gt;dump<span class="w"> </span><span class="o">=</span><span class="w"> </span>value<span class="o">[</span><span class="m">1</span><span class="o">]</span><span class="p">;</span> <span class="w"> </span><span class="p">|</span><span class="w"> </span>^~~~ At<span class="w"> </span>top<span class="w"> </span>level: src/http/ngx_http_core_module.c:4414:1:<span class="w"> </span>error:<span class="w"> </span>ngx_http_core_dump<span class="w"> </span>defined<span class="w"> </span>but<span class="w"> </span>not<span class="w"> </span>used<span class="w"> </span><span class="o">[</span>-Werror<span class="o">=</span>unused-function<span class="o">]</span> <span class="m">4414</span><span class="w"> </span><span class="p">|</span><span class="w"> </span>ngx_http_core_dump<span class="o">(</span>ngx_conf_t<span class="w"> </span>*cf,<span class="w"> </span>ngx_command_t<span class="w"> </span>*cmd,<span class="w"> </span>void<span class="w"> </span>*conf<span class="o">)</span> <span class="w"> </span><span class="p">|</span><span class="w"> </span>^~~~~~~~~~~~~~~~~~~~~ cc1:<span class="w"> </span>all<span class="w"> </span>warnings<span class="w"> </span>being<span class="w"> </span>treated<span class="w"> </span>as<span class="w"> </span>errors make<span class="o">[</span><span class="m">1</span><span class="o">]</span>:<span class="w"> </span>***<span class="w"> </span><span class="o">[</span>objs/Makefile:834:<span class="w"> </span>objs/src/http/ngx_http_core_module.o<span class="o">]</span><span class="w"> </span>Error<span class="w"> </span><span class="m">1</span> make<span class="o">[</span><span class="m">1</span><span class="o">]</span>:<span class="w"> </span>Leaving<span class="w"> </span>directory<span class="w"> </span><span class="s1">&#39;/home/phil/vendor/nginx&#39;</span> make:<span class="w"> </span>***<span class="w"> </span><span class="o">[</span>Makefile:10:<span class="w"> </span>build<span class="o">]</span><span class="w"> </span>Error<span class="w"> </span><span class="m">2</span> </pre></div> <p>The dump handler is undeclared. While copying <code>ngx_http_core_root</code> earlier I saw that there was a forward declaration toward the top. Let's copy that as well and see if that fixes anything.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>git<span class="w"> </span>--no-pager<span class="w"> </span>diff diff<span class="w"> </span>--git<span class="w"> </span>a/src/http/ngx_http_core_module.c<span class="w"> </span>b/src/http/ngx_http_core_module.c index<span class="w"> </span>9b94b328..430e1256<span class="w"> </span><span class="m">100644</span> ---<span class="w"> </span>a/src/http/ngx_http_core_module.c +++<span class="w"> </span>b/src/http/ngx_http_core_module.c @@<span class="w"> </span>-56,6<span class="w"> </span>+56,7<span class="w"> </span>@@<span class="w"> </span>static<span class="w"> </span>char<span class="w"> </span>*ngx_http_core_listen<span class="o">(</span>ngx_conf_t<span class="w"> </span>*cf,<span class="w"> </span>ngx_command_t<span class="w"> </span>*cmd, static<span class="w"> </span>char<span class="w"> </span>*ngx_http_core_server_name<span class="o">(</span>ngx_conf_t<span class="w"> </span>*cf,<span class="w"> </span>ngx_command_t<span class="w"> </span>*cmd, <span class="w"> </span>void<span class="w"> </span>*conf<span class="o">)</span><span class="p">;</span> static<span class="w"> </span>char<span class="w"> </span>*ngx_http_core_root<span class="o">(</span>ngx_conf_t<span class="w"> </span>*cf,<span class="w"> </span>ngx_command_t<span class="w"> </span>*cmd,<span class="w"> </span>void<span class="w"> </span>*conf<span class="o">)</span><span class="p">;</span> +static<span class="w"> </span>char<span class="w"> </span>*ngx_http_core_dump<span class="o">(</span>ngx_conf_t<span class="w"> </span>*cf,<span class="w"> </span>ngx_command_t<span class="w"> </span>*cmd,<span class="w"> </span>void<span class="w"> </span>*conf<span class="o">)</span><span class="p">;</span> <span class="w"> </span>static<span class="w"> </span>char<span class="w"> </span>*ngx_http_core_limit_except<span class="o">(</span>ngx_conf_t<span class="w"> </span>*cf,<span class="w"> </span>ngx_command_t<span class="w"> </span>*cmd, <span class="w"> </span>void<span class="w"> </span>*conf<span class="o">)</span><span class="p">;</span> <span class="w"> </span>static<span class="w"> </span>char<span class="w"> </span>*ngx_http_core_set_aio<span class="o">(</span>ngx_conf_t<span class="w"> </span>*cf,<span class="w"> </span>ngx_command_t<span class="w"> </span>*cmd, </pre></div> <p>And build:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>make make<span class="w"> </span>-f<span class="w"> </span>objs/Makefile make<span class="o">[</span><span class="m">1</span><span class="o">]</span>:<span class="w"> </span>Entering<span class="w"> </span>directory<span class="w"> </span><span class="s1">&#39;/home/phil/vendor/nginx&#39;</span> cc<span class="w"> </span>-c<span class="w"> </span>-pipe<span class="w"> </span>-O<span class="w"> </span>-W<span class="w"> </span>-Wall<span class="w"> </span>-Wpointer-arith<span class="w"> </span>-Wno-unused-parameter<span class="w"> </span>-Werror<span class="w"> </span>-g<span class="w"> </span>-I<span class="w"> </span>src/core<span class="w"> </span>-I<span class="w"> </span>src/event<span class="w"> </span>-I<span class="w"> </span>src/event/modules<span class="w"> </span>-I<span class="w"> </span>src/os/unix<span class="w"> </span>-I<span class="w"> </span>objs<span class="w"> </span>-I<span class="w"> </span>src/http<span class="w"> </span>-I<span class="w"> </span>src/http/modules<span class="w"> </span><span class="se">\</span> <span class="w"> </span>-o<span class="w"> </span>objs/src/http/ngx_http_core_module.o<span class="w"> </span><span class="se">\</span> src/http/ngx_http_core_module.c src/http/ngx_http_core_module.c:<span class="w"> </span>In<span class="w"> </span><span class="k">function</span><span class="w"> </span>ngx_http_core_dump: src/http/ngx_http_core_module.c:4419:9:<span class="w"> </span>error:<span class="w"> </span>ngx_http_core_loc_conf_t<span class="w"> </span><span class="o">{</span>aka<span class="w"> </span>struct<span class="w"> </span>ngx_http_core_loc_conf_s<span class="o">}</span><span class="w"> </span>has<span class="w"> </span>no<span class="w"> </span>member<span class="w"> </span>named<span class="w"> </span>dump <span class="m">4419</span><span class="w"> </span><span class="p">|</span><span class="w"> </span>clcf-&gt;dump<span class="w"> </span><span class="o">=</span><span class="w"> </span>value<span class="o">[</span><span class="m">1</span><span class="o">]</span><span class="p">;</span> <span class="w"> </span><span class="p">|</span><span class="w"> </span>^~ make<span class="o">[</span><span class="m">1</span><span class="o">]</span>:<span class="w"> </span>***<span class="w"> </span><span class="o">[</span>objs/Makefile:834:<span class="w"> </span>objs/src/http/ngx_http_core_module.o<span class="o">]</span><span class="w"> </span>Error<span class="w"> </span><span class="m">1</span> make<span class="o">[</span><span class="m">1</span><span class="o">]</span>:<span class="w"> </span>Leaving<span class="w"> </span>directory<span class="w"> </span><span class="s1">&#39;/home/phil/vendor/nginx&#39;</span> make:<span class="w"> </span>***<span class="w"> </span><span class="o">[</span>Makefile:10:<span class="w"> </span>build<span class="o">]</span><span class="w"> </span>Error<span class="w"> </span><span class="m">2</span> </pre></div> <p>Perfect. Now let's add <code>dump</code> as a member to this conf object.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>git<span class="w"> </span>--no-pager<span class="w"> </span>grep<span class="w"> </span>ngx_http_core_loc_conf_t<span class="se">\;</span> src/http/ngx_http_core_module.h:typedef<span class="w"> </span>struct<span class="w"> </span>ngx_http_core_loc_conf_s<span class="w"> </span>ngx_http_core_loc_conf_t<span class="p">;</span> </pre></div> <p>Let's just clone the <code>root</code> member:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>diff<span class="w"> </span>--git<span class="w"> </span>a/src/http/ngx_http_core_module.h<span class="w"> </span>b/src/http/ngx_http_core_module.h index<span class="w"> </span>2aadae7f..6b1b178b<span class="w"> </span><span class="m">100644</span> ---<span class="w"> </span>a/src/http/ngx_http_core_module.h +++<span class="w"> </span>b/src/http/ngx_http_core_module.h @@<span class="w"> </span>-333,6<span class="w"> </span>+333,7<span class="w"> </span>@@<span class="w"> </span>struct<span class="w"> </span>ngx_http_core_loc_conf_s<span class="w"> </span><span class="o">{</span> /*<span class="w"> </span>location<span class="w"> </span>name<span class="w"> </span>length<span class="w"> </span><span class="k">for</span><span class="w"> </span>inclusive<span class="w"> </span>location<span class="w"> </span>with<span class="w"> </span>inherited<span class="w"> </span><span class="nb">alias</span><span class="w"> </span>*/ <span class="w"> </span>size_t<span class="w"> </span>alias<span class="p">;</span> <span class="w"> </span>ngx_str_t<span class="w"> </span>root<span class="p">;</span><span class="w"> </span>/*<span class="w"> </span>root,<span class="w"> </span><span class="nb">alias</span><span class="w"> </span>*/ +<span class="w"> </span>ngx_str_t<span class="w"> </span>dump<span class="p">;</span> <span class="w"> </span>ngx_str_t<span class="w"> </span>post_action<span class="p">;</span> <span class="w"> </span>ngx_array_t<span class="w"> </span>*root_lengths<span class="p">;</span> </pre></div> <p>Run <code>make</code> and it succeeds!</p> <p>Now we spend a few hours looking around for a good place to add a hook during a request. Ultimately, <code>ngx_http_core_find_config_phase</code> seems like a good place since only then will we be dealing with the struct we added <code>dump</code> to.</p> <p>Next step is figuring out how to send a response. Grepping for <code>response</code> isn't super useful, neither is <code>write</code>. But <code>send</code> has some pretty low-level but obvious behavior.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>git<span class="w"> </span>--no-pager<span class="w"> </span>grep<span class="w"> </span>send<span class="se">\(</span> src/mail/ngx_mail.h:void<span class="w"> </span>ngx_mail_send<span class="o">(</span>ngx_event_t<span class="w"> </span>*wev<span class="o">)</span><span class="p">;</span> src/mail/ngx_mail_auth_http_module.c:<span class="w"> </span><span class="nv">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>ngx_send<span class="o">(</span>c,<span class="w"> </span>ctx-&gt;request-&gt;pos,<span class="w"> </span>size<span class="o">)</span><span class="p">;</span><span class="o">)</span> ... </pre></div> <p>That second result looks promising. Looking at that file it looks like we need an object that has a <code>-&gt;data</code> member. In <code>src/http/ngx_http_core_module.c</code> I noticed that the request object has a member that looks interesting: <code>r-&gt;connection-&gt;write-&gt;data</code>. And if we look up the signature we just need to also send <code>ngx_send</code> a string and a length.</p> <p>Thankfully we already have that from our <code>dump</code> member. So let's try something simple:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>git<span class="w"> </span>--no-pager<span class="w"> </span>diff diff<span class="w"> </span>--git<span class="w"> </span>a/src/http/ngx_http_core_module.c<span class="w"> </span>b/src/http/ngx_http_core_module.c index<span class="w"> </span>9b94b328..bd58788b<span class="w"> </span><span class="m">100644</span> ---<span class="w"> </span>a/src/http/ngx_http_core_module.c +++<span class="w"> </span>b/src/http/ngx_http_core_module.c @@<span class="w"> </span>-989,6<span class="w"> </span>+996,11<span class="w"> </span>@@<span class="w"> </span>ngx_http_core_find_config_phase<span class="o">(</span>ngx_http_request_t<span class="w"> </span>*r, <span class="w"> </span>ngx_http_finalize_request<span class="o">(</span>r,<span class="w"> </span>NGX_HTTP_REQUEST_ENTITY_TOO_LARGE<span class="o">)</span><span class="p">;</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span>NGX_OK<span class="p">;</span> <span class="o">}</span> + +<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="o">(</span>clcf-&gt;dump.len<span class="o">)</span><span class="w"> </span><span class="o">{</span> +<span class="w"> </span>ngx_send<span class="o">(</span>r-&gt;connection-&gt;write-&gt;data,<span class="w"> </span>clcf-&gt;dump.data,<span class="w"> </span>clcf-&gt;dump.len<span class="o">)</span><span class="p">;</span> +<span class="w"> </span><span class="k">return</span><span class="w"> </span>NGX_OK<span class="p">;</span> +<span class="w"> </span><span class="o">}</span> </pre></div> <p>Run <code>make</code> and it's good! Let's turn off the nginx daemon and worker processes so it's easier to quit as we're iterating.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>git<span class="w"> </span>--no-pager<span class="w"> </span>diff<span class="w"> </span>conf/ diff<span class="w"> </span>--git<span class="w"> </span>a/conf/nginx.conf<span class="w"> </span>b/conf/nginx.conf index<span class="w"> </span>29bc085f..7cce7d65<span class="w"> </span><span class="m">100644</span> ---<span class="w"> </span>a/conf/nginx.conf +++<span class="w"> </span>b/conf/nginx.conf @@<span class="w"> </span>-1,4<span class="w"> </span>+1,5<span class="w"> </span>@@ - +daemon<span class="w"> </span>off<span class="p">;</span> +master_process<span class="w"> </span>off<span class="p">;</span> <span class="w"> </span><span class="c1">#user nobody;</span> <span class="w"> </span>worker_processes<span class="w"> </span><span class="m">1</span><span class="p">;</span> </pre></div> <p>Now run <code>./objs/nginx -c $(pwd)/conf/nginx.conf</code>. Try to curl:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>curl<span class="w"> </span>localhost:2020 curl:<span class="w"> </span><span class="o">(</span><span class="m">1</span><span class="o">)</span><span class="w"> </span>Received<span class="w"> </span>HTTP/0.9<span class="w"> </span>when<span class="w"> </span>not<span class="w"> </span>allowed </pre></div> <p>Huh, that's unexpected. Let's try using telnet to get the whole raw response:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>telnet<span class="w"> </span>localhost<span class="w"> </span><span class="m">2020</span> Trying<span class="w"> </span>::1... telnet:<span class="w"> </span>connect<span class="w"> </span>to<span class="w"> </span>address<span class="w"> </span>::1:<span class="w"> </span>Connection<span class="w"> </span>refused Trying<span class="w"> </span><span class="m">127</span>.0.0.1... Connected<span class="w"> </span>to<span class="w"> </span>localhost. Escape<span class="w"> </span>character<span class="w"> </span>is<span class="w"> </span><span class="s1">&#39;^]&#39;</span>. GET<span class="w"> </span>/ It<span class="w"> </span>was<span class="w"> </span>a<span class="w"> </span>good<span class="w"> </span>Thursday. </pre></div> <p>Oh man. That's super cool. Unfortunately it's also not valid HTTP. It seems like if we're using <code>ngx_send</code> we'll have to set the HTTP response headers manually.</p> <p>If we're going to pass a literal string to <code>ngx_send</code> we'll have to convert it to an <code>ngx_str_t</code>. Judging from <code>src/core/ngx_string.h</code> the <code>ngx_string</code> macro should be able to do this.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>git<span class="w"> </span>--no-pager<span class="w"> </span>diff<span class="w"> </span>src diff<span class="w"> </span>--git<span class="w"> </span>a/src/http/ngx_http_core_module.c<span class="w"> </span>b/src/http/ngx_http_core_module.c index<span class="w"> </span>9b94b328..1a1baccd<span class="w"> </span><span class="m">100644</span> ---<span class="w"> </span>a/src/http/ngx_http_core_module.c +++<span class="w"> </span>b/src/http/ngx_http_core_module.c @@<span class="w"> </span>-989,6<span class="w"> </span>+996,13<span class="w"> </span>@@<span class="w"> </span>ngx_http_core_find_config_phase<span class="o">(</span>ngx_http_request_t<span class="w"> </span>*r, <span class="w"> </span>ngx_http_finalize_request<span class="o">(</span>r,<span class="w"> </span>NGX_HTTP_REQUEST_ENTITY_TOO_LARGE<span class="o">)</span><span class="p">;</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span>NGX_OK<span class="p">;</span> <span class="w"> </span><span class="o">}</span> + +<span class="w"> </span>static<span class="w"> </span>ngx_str_t<span class="w"> </span><span class="nv">header</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>ngx_string<span class="o">(</span><span class="s2">&quot;HTTP/1.0 200 OK\r\n\r\n&quot;</span><span class="o">)</span><span class="p">;</span> +<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="o">(</span>clcf-&gt;dump.len<span class="o">)</span><span class="w"> </span><span class="o">{</span> +<span class="w"> </span>ngx_send<span class="o">(</span>r-&gt;connection-&gt;write-&gt;data,<span class="w"> </span>header.data,<span class="w"> </span>header.len<span class="o">)</span><span class="p">;</span> +<span class="w"> </span>ngx_send<span class="o">(</span>r-&gt;connection-&gt;write-&gt;data,<span class="w"> </span>clcf-&gt;dump.data,<span class="w"> </span>clcf-&gt;dump.len<span class="o">)</span><span class="p">;</span> +<span class="w"> </span><span class="k">return</span><span class="w"> </span>NGX_OK<span class="p">;</span> +<span class="w"> </span><span class="o">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="o">(</span><span class="nv">rc</span><span class="w"> </span><span class="o">==</span><span class="w"> </span>NGX_DONE<span class="o">)</span><span class="w"> </span><span class="o">{</span> <span class="w"> </span>ngx_http_clear_location<span class="o">(</span>r<span class="o">)</span><span class="p">;</span> <span class="o">}</span> </pre></div> <p>Compile, run and curl:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>curl<span class="w"> </span>localhost:2020 </pre></div> <p>Huh. It's no longer complaining about HTTP/0.9 but it's now hanging. Let's try verbose curling.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>curl<span class="w"> </span>-vvv<span class="w"> </span>localhost:2020 *<span class="w"> </span>Trying<span class="w"> </span>::1:2020... *<span class="w"> </span>connect<span class="w"> </span>to<span class="w"> </span>::1<span class="w"> </span>port<span class="w"> </span><span class="m">2020</span><span class="w"> </span>failed:<span class="w"> </span>Connection<span class="w"> </span>refused *<span class="w"> </span>Trying<span class="w"> </span><span class="m">127</span>.0.0.1:2020... *<span class="w"> </span>Connected<span class="w"> </span>to<span class="w"> </span>localhost<span class="w"> </span><span class="o">(</span><span class="m">127</span>.0.0.1<span class="o">)</span><span class="w"> </span>port<span class="w"> </span><span class="m">2020</span><span class="w"> </span><span class="o">(</span><span class="c1">#0)</span> &gt;<span class="w"> </span>GET<span class="w"> </span>/<span class="w"> </span>HTTP/1.1 &gt;<span class="w"> </span>Host:<span class="w"> </span>localhost:2020 &gt;<span class="w"> </span>User-Agent:<span class="w"> </span>curl/7.71.1 &gt;<span class="w"> </span>Accept:<span class="w"> </span>*/* &gt; *<span class="w"> </span>Mark<span class="w"> </span>bundle<span class="w"> </span>as<span class="w"> </span>not<span class="w"> </span>supporting<span class="w"> </span>multiuse *<span class="w"> </span>HTTP<span class="w"> </span><span class="m">1</span>.0,<span class="w"> </span>assume<span class="w"> </span>close<span class="w"> </span>after<span class="w"> </span>body &lt;<span class="w"> </span>HTTP/1.0<span class="w"> </span><span class="m">200</span><span class="w"> </span>OK </pre></div> <p>That's really weird. But I noticed there was a <code>ngx_http_request_finalize</code> function that other parts of the code were calling. Let's try adding that.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>git<span class="w"> </span>--no-pager<span class="w"> </span>diff<span class="w"> </span>src diff<span class="w"> </span>--git<span class="w"> </span>a/src/http/ngx_http_core_module.c<span class="w"> </span>b/src/http/ngx_http_core_module.c index<span class="w"> </span>9b94b328..1a1baccd<span class="w"> </span><span class="m">100644</span> ---<span class="w"> </span>a/src/http/ngx_http_core_module.c +++<span class="w"> </span>b/src/http/ngx_http_core_module.c @@<span class="w"> </span>-989,6<span class="w"> </span>+996,14<span class="w"> </span>@@<span class="w"> </span>ngx_http_core_find_config_phase<span class="o">(</span>ngx_http_request_t<span class="w"> </span>*r, <span class="w"> </span>ngx_http_finalize_request<span class="o">(</span>r,<span class="w"> </span>NGX_HTTP_REQUEST_ENTITY_TOO_LARGE<span class="o">)</span><span class="p">;</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span>NGX_OK<span class="p">;</span> <span class="w"> </span><span class="o">}</span> + +<span class="w"> </span>static<span class="w"> </span>ngx_str_t<span class="w"> </span><span class="nv">header</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>ngx_string<span class="o">(</span><span class="s2">&quot;HTTP/1.0 200 OK\r\n\r\n&quot;</span><span class="o">)</span><span class="p">;</span> +<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="o">(</span>clcf-&gt;dump.len<span class="o">)</span><span class="w"> </span><span class="o">{</span> +<span class="w"> </span>ngx_send<span class="o">(</span>r-&gt;connection-&gt;write-&gt;data,<span class="w"> </span>header.data,<span class="w"> </span>header.len<span class="o">)</span><span class="p">;</span> +<span class="w"> </span>ngx_send<span class="o">(</span>r-&gt;connection-&gt;write-&gt;data,<span class="w"> </span>clcf-&gt;dump.data,<span class="w"> </span>clcf-&gt;dump.len<span class="o">)</span><span class="p">;</span> +<span class="w"> </span>ngx_http_finalize_request<span class="o">(</span>r,<span class="w"> </span>NGX_DONE<span class="o">)</span><span class="p">;</span> +<span class="w"> </span><span class="k">return</span><span class="w"> </span>NGX_OK<span class="p">;</span> +<span class="w"> </span><span class="o">}</span> </pre></div> <p>Build, run, curl. Still hanging. Looking into the source code of <code>ngx_http_finalize_request</code> it seems like there's a case where the connection is completely closed if you pass in <code>NGX_HTTP_CLOSE</code>. Let's try that.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>curl<span class="w"> </span>localhost:2020 It<span class="w"> </span>was<span class="w"> </span>a<span class="w"> </span>good<span class="w"> </span>Thursday. </pre></div> <p>Well hot dog, it works.</p> <h3 id="reflection">Reflection</h3><p>Is this a good way to implement commands in nginx? No. While I knew a bit about nginx modules as a user it's clear that as a developer this command could have been implemented much more cleanly as a module too.</p> <p>There also has to be higher-level tooling for returning constructing responses rather than writing out headers manually.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Been wanting to write some posts like this for a long time showing some techniques for hacking on an unfamiliar project using very basic programming/Linux tools. In this post it&#39;s nginx<a href="https://t.co/t7Y43Zmxhk">https://t.co/t7Y43Zmxhk</a> <a href="https://t.co/EOatURm5wx">pic.twitter.com/EOatURm5wx</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1378906317004361732?ref_src=twsrc%5Etfw">April 5, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/learning-a-new-codebase-hacking-nginx.htmlSun, 04 Apr 2021 00:00:00 +0000How to get better at recursionhttp://notes.eatonphil.com/practicing-recursion.html<p>tldr; reimplement standard library functions in your favorite language <em>without loops</em>.</p> <h3 id="background">Background</h3><p>For a few years after college I spent a lot of free time doing projects in Standard ML and Scheme. As a result I got really comfortable doing recursion. The two big reasons for this are 1) neither Standard ML or Scheme have loops and 2) they both have very small standard libraries. (Ok, they have loops. They're just so limited as to be useless.)</p> <p>I ended up building <a href="https://github.com/eatonphil/ponyo">a standard library</a> for Standard ML including string functions (contains, indexOf, count, replace, etc.), an HTTP server and client, a hash table, a binary search tree, parts of a Standard ML parser, and <a href="https://ponyo.org/reference">so on</a>.</p> <p>All of this without loops.</p> <h3 id="strategy">Strategy</h3><p>The good news (if you don't want to learn a new language) is that you don't have to take up Standard ML or Scheme to get better at recursion. But you do need to dedicate some time to <em>practicing recursion</em> to get better at it.</p> <p>My recommendation would be to pick 10-20 string or array functions out of your favorite language's standard library and reimplement them without loops. (Obviously, start simple and just pick one. But don't stop there.)</p> <h3 id="some-examples">Some examples</h3><p>Here's an example reimplementation of <code>indexOf</code> in JavaScript:</p> <div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">indexOf</span><span class="p">(</span><span class="nx">input</span><span class="p">,</span><span class="w"> </span><span class="nx">toMatch</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">helper</span><span class="p">(</span><span class="nx">index</span><span class="p">,</span><span class="w"> </span><span class="nx">offset</span><span class="p">,</span><span class="w"> </span><span class="nx">test</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">index</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="nx">input</span><span class="p">.</span><span class="nx">length</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">-</span><span class="mf">1</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">toMatch</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="nx">test</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">index</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">input</span><span class="p">[</span><span class="nx">index</span><span class="o">+</span><span class="nx">offset</span><span class="p">]</span><span class="w"> </span><span class="o">!==</span><span class="w"> </span><span class="nx">toMatch</span><span class="p">[</span><span class="nx">offset</span><span class="p">]</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">test</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="nx">toMatch</span><span class="p">.</span><span class="nx">length</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">helper</span><span class="p">(</span><span class="nx">index</span><span class="o">+</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="s2">&quot;&quot;</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">helper</span><span class="p">(</span><span class="nx">index</span><span class="p">,</span><span class="w"> </span><span class="nx">offset</span><span class="o">+</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="nx">test</span><span class="o">+</span><span class="nx">input</span><span class="p">[</span><span class="nx">index</span><span class="o">+</span><span class="nx">offset</span><span class="p">]);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">helper</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="s2">&quot;&quot;</span><span class="p">);</span> <span class="p">}</span> </pre></div> <p>Or here's an example immutable reimplementation of <code>insert</code> in Python:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">insert</span><span class="p">(</span><span class="n">arr</span><span class="p">,</span> <span class="n">index</span><span class="p">,</span> <span class="n">item</span><span class="p">):</span> <span class="k">def</span> <span class="nf">helper</span><span class="p">(</span><span class="n">currentIndex</span><span class="p">,</span> <span class="n">accum</span><span class="p">):</span> <span class="k">if</span> <span class="n">currentIndex</span> <span class="o">==</span> <span class="nb">len</span><span class="p">(</span><span class="n">arr</span><span class="p">):</span> <span class="k">return</span> <span class="n">accum</span> <span class="k">if</span> <span class="n">currentIndex</span> <span class="o">&lt;</span> <span class="n">index</span><span class="p">:</span> <span class="k">return</span> <span class="n">helper</span><span class="p">(</span><span class="n">currentIndex</span><span class="o">+</span><span class="mi">1</span><span class="p">,</span> <span class="n">accum</span> <span class="o">+</span> <span class="p">[</span><span class="n">arr</span><span class="p">[</span><span class="n">currentIndex</span><span class="p">]])</span> <span class="k">if</span> <span class="n">currentIndex</span> <span class="o">==</span> <span class="n">index</span><span class="p">:</span> <span class="k">return</span> <span class="n">helper</span><span class="p">(</span><span class="n">currentIndex</span><span class="o">+</span><span class="mi">1</span><span class="p">,</span> <span class="n">accum</span> <span class="o">+</span> <span class="p">[</span><span class="n">item</span><span class="p">,</span> <span class="n">arr</span><span class="p">[</span><span class="n">currentIndex</span><span class="p">]])</span> <span class="k">return</span> <span class="n">helper</span><span class="p">(</span><span class="n">currentIndex</span><span class="o">+</span><span class="mi">1</span><span class="p">,</span> <span class="n">accum</span> <span class="o">+</span> <span class="p">[</span><span class="n">arr</span><span class="p">[</span><span class="n">currentIndex</span><span class="p">]])</span> <span class="k">return</span> <span class="n">helper</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="p">[])</span> </pre></div> <p class="note"> You're going to find an edge case and that's alright. The important part at the moment is practicing recursion. </p><p>For bonus points, avoid all mutation in your implementations and use only tail recursion.</p> <p>Happy recursion!</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Reimplementing standard library functions without for loops is a great way to get better at recursion and you don&#39;t need to use a functional programming language to do so<a href="https://t.co/JiPnXMQW3l">https://t.co/JiPnXMQW3l</a> <a href="https://t.co/MHwX5t70HT">pic.twitter.com/MHwX5t70HT</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1368602496168497154?ref_src=twsrc%5Etfw">March 7, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/practicing-recursion.htmlSun, 07 Mar 2021 00:00:00 +0000Extending gosql to supporting LIMIT and OFFSEThttp://notes.eatonphil.com/extending-gosql-to-support-limit-and-offset.html<p>It's been a few months since I picked up <a href="https://github.com/eatonphil/gosql">gosql</a> and I wanted to use it to prototype a SQL interface for data stored in S3. But one missing critical feature in gosql is <code>LIMIT</code> and <code>OFFSET</code> support. This post walks through the few key changes to gosql to support <code>LIMIT</code> and <code>OFFSET</code>.</p> <p>You can find <a href="https://github.com/eatonphil/gosql/commit/9405e433ec51f8f1d72c9b2e8f45109d738edec4">this commit in full on Github</a>.</p> <p class="note"> This post builds on top of a series on building a SQL database from scratch in Golang. <! forgive me, for I have sinned > <br /> <a href="/database-basics.html">1. SELECT, INSERT, CREATE and a REPL</a> <br /> <a href="/database-basics-expressions-and-where.html">2. binary expressions and WHERE filters</a> <br /> <a href="/database-basics-indexes.html">3. indexes</a> <br /> <a href="/database-basics-a-database-sql-driver.html">4. a database/sql driver</a> </p><h3 id="lexing">Lexing</h3><p>The first step is to update the lexer to know about the <code>LIMIT</code> and <code>OFFSET</code> keywords. Since we already have a generalized method of lexing any keywords from an array (see <code>lexer.go:lexKeyword</code>), this is really easy. Just add a new <code>Keyword</code>:</p> <div class="highlight"><pre><span></span><span class="err">@@</span><span class="w"> </span><span class="o">-</span><span class="mi">37</span><span class="p">,</span><span class="mi">6</span><span class="w"> </span><span class="o">+</span><span class="mi">37</span><span class="p">,</span><span class="mi">8</span><span class="w"> </span><span class="err">@@</span><span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="nx">OnKeyword</span><span class="w"> </span><span class="nx">Keyword</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">&quot;on&quot;</span> <span class="w"> </span><span class="nx">PrimarykeyKeyword</span><span class="w"> </span><span class="nx">Keyword</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">&quot;primary key&quot;</span> <span class="w"> </span><span class="nx">NullKeyword</span><span class="w"> </span><span class="nx">Keyword</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">&quot;null&quot;</span> <span class="o">+</span><span class="w"> </span><span class="nx">LimitKeyword</span><span class="w"> </span><span class="nx">Keyword</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">&quot;limit&quot;</span> <span class="o">+</span><span class="w"> </span><span class="nx">OffsetKeyword</span><span class="w"> </span><span class="nx">Keyword</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">&quot;offset&quot;</span> <span class="w"> </span><span class="p">)</span> </pre></div> <p>And then add these two new enums to the list of <code>Keyword</code>s to lex:</p> <div class="highlight"><pre><span></span><span class="err">@@</span><span class="w"> </span><span class="o">-</span><span class="mi">261</span><span class="p">,</span><span class="mi">6</span><span class="w"> </span><span class="o">+</span><span class="mi">263</span><span class="p">,</span><span class="mi">8</span><span class="w"> </span><span class="err">@@</span><span class="w"> </span><span class="kd">func</span><span class="w"> </span><span class="nx">lexKeyword</span><span class="p">(</span><span class="nx">source</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="w"> </span><span class="nx">cursor</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">Token</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">OnKeyword</span><span class="p">,</span> <span class="w"> </span><span class="nx">PrimarykeyKeyword</span><span class="p">,</span> <span class="w"> </span><span class="nx">NullKeyword</span><span class="p">,</span> <span class="o">+</span><span class="w"> </span><span class="nx">LimitKeyword</span><span class="p">,</span> <span class="o">+</span><span class="w"> </span><span class="nx">OffsetKeyword</span><span class="p">,</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">options</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span> </pre></div> <p>That's it for the lexer.</p> <h3 id="parsing">Parsing</h3><p>Before we can parse limit and offset into the AST, we have to modify our AST struct to support these two fields in ast.go:</p> <div class="highlight"><pre><span></span><span class="err">@@</span><span class="w"> </span><span class="o">-</span><span class="mi">54</span><span class="p">,</span><span class="mi">9</span><span class="w"> </span><span class="o">+</span><span class="mi">54</span><span class="p">,</span><span class="mi">11</span><span class="w"> </span><span class="err">@@</span><span class="w"> </span><span class="kd">type</span><span class="w"> </span><span class="nx">SelectItem</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">type</span><span class="w"> </span><span class="nx">SelectStatement</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="o">-</span><span class="w"> </span><span class="nx">Item</span><span class="w"> </span><span class="o">*</span><span class="p">[]</span><span class="o">*</span><span class="nx">SelectItem</span> <span class="o">-</span><span class="w"> </span><span class="nx">From</span><span class="w"> </span><span class="o">*</span><span class="nx">Token</span> <span class="o">-</span><span class="w"> </span><span class="nx">Where</span><span class="w"> </span><span class="o">*</span><span class="nx">Expression</span> <span class="o">+</span><span class="w"> </span><span class="nx">Item</span><span class="w"> </span><span class="o">*</span><span class="p">[]</span><span class="o">*</span><span class="nx">SelectItem</span> <span class="o">+</span><span class="w"> </span><span class="nx">From</span><span class="w"> </span><span class="o">*</span><span class="nx">Token</span> <span class="o">+</span><span class="w"> </span><span class="nx">Where</span><span class="w"> </span><span class="o">*</span><span class="nx">Expression</span> <span class="o">+</span><span class="w"> </span><span class="nx">Limit</span><span class="w"> </span><span class="o">*</span><span class="nx">Expression</span> <span class="o">+</span><span class="w"> </span><span class="nx">Offset</span><span class="w"> </span><span class="o">*</span><span class="nx">Expression</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>And to be a good citizen, we'll fix up the <code>GenerateCode</code> helper function (for pretty-printing the AST) to show <code>LIMIT</code> and <code>OFFSET</code>.</p> <div class="highlight"><pre><span></span><span class="err">@@</span><span class="w"> </span><span class="o">-</span><span class="mi">73</span><span class="p">,</span><span class="mi">17</span><span class="w"> </span><span class="o">+</span><span class="mi">75</span><span class="p">,</span><span class="mi">24</span><span class="w"> </span><span class="err">@@</span><span class="w"> </span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">ss</span><span class="w"> </span><span class="nx">SelectStatement</span><span class="p">)</span><span class="w"> </span><span class="nx">GenerateCode</span><span class="p">()</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">item</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">item</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="o">-</span><span class="w"> </span><span class="nx">from</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="s">&quot;&quot;</span> <span class="o">+</span><span class="w"> </span><span class="nx">code</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="s">&quot;SELECT\n&quot;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">Join</span><span class="p">(</span><span class="nx">item</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;,\n&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">ss</span><span class="p">.</span><span class="nx">From</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="o">-</span><span class="w"> </span><span class="nx">from</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">&quot;\nFROM\n\t\&quot;%s\&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">ss</span><span class="p">.</span><span class="nx">From</span><span class="p">.</span><span class="nx">Value</span><span class="p">)</span> <span class="o">+</span><span class="w"> </span><span class="nx">code</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">&quot;\nFROM\n\t\&quot;%s\&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">ss</span><span class="p">.</span><span class="nx">From</span><span class="p">.</span><span class="nx">Value</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="o">-</span><span class="w"> </span><span class="nx">where</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="s">&quot;&quot;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">ss</span><span class="p">.</span><span class="nx">Where</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="o">-</span><span class="w"> </span><span class="nx">where</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">&quot;\nWHERE\n\t%s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">ss</span><span class="p">.</span><span class="nx">Where</span><span class="p">.</span><span class="nx">GenerateCode</span><span class="p">())</span> <span class="o">+</span><span class="w"> </span><span class="nx">code</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="s">&quot;\nWHERE\n\t&quot;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">ss</span><span class="p">.</span><span class="nx">Where</span><span class="p">.</span><span class="nx">GenerateCode</span><span class="p">()</span> <span class="w"> </span><span class="p">}</span> <span class="o">-</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">&quot;SELECT\n%s%s%s;&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">Join</span><span class="p">(</span><span class="nx">item</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;,\n&quot;</span><span class="p">),</span><span class="w"> </span><span class="nx">from</span><span class="p">,</span><span class="w"> </span><span class="nx">where</span><span class="p">)</span> <span class="o">+</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">ss</span><span class="p">.</span><span class="nx">Limit</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="o">+</span><span class="w"> </span><span class="nx">code</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="s">&quot;\nLIMIT\n\t&quot;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">ss</span><span class="p">.</span><span class="nx">Limit</span><span class="p">.</span><span class="nx">GenerateCode</span><span class="p">()</span> <span class="o">+</span><span class="w"> </span><span class="p">}</span> <span class="o">+</span> <span class="o">+</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">ss</span><span class="p">.</span><span class="nx">Offset</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="o">+</span><span class="w"> </span><span class="nx">code</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="s">&quot;\nOFFSET\n\t&quot;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">ss</span><span class="p">.</span><span class="nx">Limit</span><span class="p">.</span><span class="nx">GenerateCode</span><span class="p">()</span> <span class="o">+</span><span class="w"> </span><span class="p">}</span> <span class="o">+</span> <span class="o">+</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">code</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s">&quot;;&quot;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">type</span><span class="w"> </span><span class="nx">ColumnDefinition</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> </pre></div> <p>That's it for modifying the AST itself. Now we can modify the select statement parser to look for these two new sections. It's pretty simple: for both <code>LIMIT</code> and <code>OFFSET</code> first check if they exist in the current statement and then try to parse the expression after them, in parser.go:</p> <div class="highlight"><pre><span></span><span class="err">@@</span><span class="w"> </span><span class="o">-</span><span class="mi">285</span><span class="p">,</span><span class="mi">6</span><span class="w"> </span><span class="o">+</span><span class="mi">288</span><span class="p">,</span><span class="mi">30</span><span class="w"> </span><span class="err">@@</span><span class="w"> </span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">p</span><span class="w"> </span><span class="nx">Parser</span><span class="p">)</span><span class="w"> </span><span class="nx">parseSelectStatement</span><span class="p">(</span><span class="nx">tokens</span><span class="w"> </span><span class="p">[]</span><span class="o">*</span><span class="nx">Token</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="nx">delimi</span> <span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">newCursor</span> <span class="w"> </span><span class="p">}</span> <span class="o">+</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">p</span><span class="p">.</span><span class="nx">parseToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">limitToken</span><span class="p">)</span> <span class="o">+</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="o">+</span><span class="w"> </span><span class="nx">limit</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">p</span><span class="p">.</span><span class="nx">parseExpression</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="nx">Token</span><span class="p">{</span><span class="nx">offsetToken</span><span class="p">,</span><span class="w"> </span><span class="nx">delimiter</span><span class="p">},</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span> <span class="o">+</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="o">+</span><span class="w"> </span><span class="nx">p</span><span class="p">.</span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected LIMIT value&quot;</span><span class="p">)</span> <span class="o">+</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="o">+</span><span class="w"> </span><span class="p">}</span> <span class="o">+</span> <span class="o">+</span><span class="w"> </span><span class="nx">slct</span><span class="p">.</span><span class="nx">Limit</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">limit</span> <span class="o">+</span><span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">newCursor</span> <span class="o">+</span><span class="w"> </span><span class="p">}</span> <span class="o">+</span> <span class="o">+</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">p</span><span class="p">.</span><span class="nx">parseToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">offsetToken</span><span class="p">)</span> <span class="o">+</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="o">+</span><span class="w"> </span><span class="nx">offset</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">p</span><span class="p">.</span><span class="nx">parseExpression</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="nx">Token</span><span class="p">{</span><span class="nx">delimiter</span><span class="p">},</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span> <span class="o">+</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="o">+</span><span class="w"> </span><span class="nx">p</span><span class="p">.</span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected OFFSET value&quot;</span><span class="p">)</span> <span class="o">+</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="o">+</span><span class="w"> </span><span class="p">}</span> <span class="o">+</span> <span class="o">+</span><span class="w"> </span><span class="nx">slct</span><span class="p">.</span><span class="nx">Offset</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">offset</span> <span class="o">+</span><span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">newCursor</span> <span class="o">+</span><span class="w"> </span><span class="p">}</span> <span class="o">+</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">slct</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>And the last tricky bit is to make sure that previous optional <code>parseExpression</code> know that they can be delimited by <code>OFFSET</code> and <code>LIMIT</code> (this delimiter awareness is just how the parser works):</p> <div class="highlight"><pre><span></span><span class="err">@@</span><span class="w"> </span><span class="o">-</span><span class="mi">273</span><span class="p">,</span><span class="mi">9</span><span class="w"> </span><span class="o">+</span><span class="mi">273</span><span class="p">,</span><span class="mi">12</span><span class="w"> </span><span class="err">@@</span><span class="w"> </span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">p</span><span class="w"> </span><span class="nx">Parser</span><span class="p">)</span><span class="w"> </span><span class="nx">parseSelectStatement</span><span class="p">(</span><span class="nx">tokens</span><span class="w"> </span><span class="p">[]</span><span class="o">*</span><span class="nx">Token</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="nx">delimi</span> <span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">newCursor</span> <span class="w"> </span><span class="p">}</span> <span class="o">+</span><span class="w"> </span><span class="nx">limitToken</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">tokenFromKeyword</span><span class="p">(</span><span class="nx">LimitKeyword</span><span class="p">)</span> <span class="o">+</span><span class="w"> </span><span class="nx">offsetToken</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">tokenFromKeyword</span><span class="p">(</span><span class="nx">OffsetKeyword</span><span class="p">)</span> <span class="o">+</span> <span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">p</span><span class="p">.</span><span class="nx">parseToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">whereToken</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="o">-</span><span class="w"> </span><span class="nx">where</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">p</span><span class="p">.</span><span class="nx">parseExpression</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="nx">Token</span><span class="p">{</span><span class="nx">delimiter</span><span class="p">},</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span> <span class="o">+</span><span class="w"> </span><span class="nx">where</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">p</span><span class="p">.</span><span class="nx">parseExpression</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="nx">Token</span><span class="p">{</span><span class="nx">limitToken</span><span class="p">,</span><span class="w"> </span><span class="nx">offsetToken</span><span class="p">,</span><span class="w"> </span><span class="nx">delimiter</span><span class="p">},</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">p</span><span class="p">.</span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected WHERE conditionals&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> </pre></div> <p>That's it for parsing!</p> <h3 id="runtime">Runtime</h3><p>Gosql has just one storage backend currently: an in-memory store. To support <code>LIMIT</code> and <code>OFFSET</code> we need to evaluate both expressions if they exist. Then while we're iterating through table rows, after testing whether each row passes the <code>WHERE</code> filter, we'll check if the number of rows passing the <code>WHERE</code> filter falls within the range of <code>OFFSET</code> and <code>LIMIT + OFFSET</code> otherwise we'll skip the row, in memory.go:</p> <div class="highlight"><pre><span></span><span class="err">@@</span><span class="w"> </span><span class="o">-</span><span class="mi">587</span><span class="p">,</span><span class="mi">6</span><span class="w"> </span><span class="o">+</span><span class="mi">587</span><span class="p">,</span><span class="mi">33</span><span class="w"> </span><span class="err">@@</span><span class="w"> </span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">mb</span><span class="w"> </span><span class="o">*</span><span class="nx">MemoryBackend</span><span class="p">)</span><span class="w"> </span><span class="nx">Select</span><span class="p">(</span><span class="nx">slct</span><span class="w"> </span><span class="o">*</span><span class="nx">SelectStatement</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">Results</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="o">+</span><span class="w"> </span><span class="nx">limit</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">rows</span><span class="p">)</span> <span class="o">+</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">slct</span><span class="p">.</span><span class="nx">Limit</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="o">+</span><span class="w"> </span><span class="nx">v</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">evaluateCell</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="o">*</span><span class="nx">slct</span><span class="p">.</span><span class="nx">Limit</span><span class="p">)</span> <span class="o">+</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="o">+</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="o">+</span><span class="w"> </span><span class="p">}</span> <span class="o">+</span> <span class="o">+</span><span class="w"> </span><span class="nx">limit</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">int</span><span class="p">(</span><span class="o">*</span><span class="nx">v</span><span class="p">.</span><span class="nx">AsInt</span><span class="p">())</span> <span class="o">+</span><span class="w"> </span><span class="p">}</span> <span class="o">+</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">limit</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="o">+</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Invalid, negative limit&quot;</span><span class="p">)</span> <span class="o">+</span><span class="w"> </span><span class="p">}</span> <span class="o">+</span> <span class="o">+</span><span class="w"> </span><span class="nx">offset</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span> <span class="o">+</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">slct</span><span class="p">.</span><span class="nx">Offset</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="o">+</span><span class="w"> </span><span class="nx">v</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">evaluateCell</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="o">*</span><span class="nx">slct</span><span class="p">.</span><span class="nx">Offset</span><span class="p">)</span> <span class="o">+</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="o">+</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="o">+</span><span class="w"> </span><span class="p">}</span> <span class="o">+</span> <span class="o">+</span><span class="w"> </span><span class="nx">offset</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">int</span><span class="p">(</span><span class="o">*</span><span class="nx">v</span><span class="p">.</span><span class="nx">AsInt</span><span class="p">())</span> <span class="o">+</span><span class="w"> </span><span class="p">}</span> <span class="o">+</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">offset</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="o">+</span><span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Invalid, negative limit&quot;</span><span class="p">)</span> <span class="o">+</span><span class="w"> </span><span class="p">}</span> <span class="o">+</span> <span class="o">+</span><span class="w"> </span><span class="nx">rowIndex</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="o">-</span><span class="mi">1</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">rows</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">result</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="nx">Cell</span><span class="p">{}</span> <span class="w"> </span><span class="nx">isFirstRow</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">results</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span> <span class="err">@@</span><span class="w"> </span><span class="o">-</span><span class="mi">602</span><span class="p">,</span><span class="mi">6</span><span class="w"> </span><span class="o">+</span><span class="mi">629</span><span class="p">,</span><span class="mi">13</span><span class="w"> </span><span class="err">@@</span><span class="w"> </span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">mb</span><span class="w"> </span><span class="o">*</span><span class="nx">MemoryBackend</span><span class="p">)</span><span class="w"> </span><span class="nx">Select</span><span class="p">(</span><span class="nx">slct</span><span class="w"> </span><span class="o">*</span><span class="nx">SelectStatement</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">Results</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="o">+</span><span class="w"> </span><span class="nx">rowIndex</span><span class="o">++</span> <span class="o">+</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">rowIndex</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nx">offset</span><span class="w"> </span><span class="p">{</span> <span class="o">+</span><span class="w"> </span><span class="k">continue</span> <span class="o">+</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">rowIndex</span><span class="w"> </span><span class="p">&gt;</span><span class="w"> </span><span class="nx">offset</span><span class="o">+</span><span class="nx">limit</span><span class="o">-</span><span class="mi">1</span><span class="w"> </span><span class="p">{</span> <span class="o">+</span><span class="w"> </span><span class="k">break</span> <span class="o">+</span><span class="w"> </span><span class="p">}</span> <span class="o">+</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">col</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">finalItems</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="nx">columnName</span><span class="p">,</span><span class="w"> </span><span class="nx">columnType</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">evaluateCell</span><span class="p">(</span><span class="nb">uint</span><span class="p">(</span><span class="nx">i</span><span class="p">),</span><span class="w"> </span><span class="o">*</span><span class="nx">col</span><span class="p">.</span><span class="nx">Exp</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> </pre></div> <p class="note"> Just to call out explicitly, with <code>LIMIT</code> and <code>OFFSET</code> we still have to check every single row in the table (at least until we've reached the offset). This should clearly illustrate why paginating based on <code>LIMIT</code> and <code>OFFSET</code> is not a great idea for big datasets <a href="https://use-the-index-luke.com/sql/partial-results/fetch-next-page">compared to index-based pagination</a>. </p><p>That's all!</p> <h3 id="trying-it-out">Trying it out</h3><div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>build<span class="w"> </span>cmd/main.go $<span class="w"> </span>./main Welcome<span class="w"> </span>to<span class="w"> </span>gosql. <span class="c1"># create table user (name text, age int);</span> ok <span class="c1"># insert into user values (&#39;meg&#39;, 2);</span> ok <span class="c1"># insert into user values (&#39;jerry&#39;, 2);</span> ok <span class="c1"># insert into user values (&#39;phil&#39;, 1);</span> ok <span class="c1"># select * from user;</span> <span class="w"> </span>name<span class="w"> </span><span class="p">|</span><span class="w"> </span>age --------+------ <span class="w"> </span>meg<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">2</span> <span class="w"> </span>jerry<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">2</span> <span class="w"> </span>phil<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">1</span> <span class="o">(</span><span class="m">3</span><span class="w"> </span>results<span class="o">)</span> ok <span class="c1"># select * from user limit 1;</span> <span class="w"> </span>name<span class="w"> </span><span class="p">|</span><span class="w"> </span>age -------+------ <span class="w"> </span>meg<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">2</span> <span class="o">(</span><span class="m">1</span><span class="w"> </span>result<span class="o">)</span> ok <span class="c1"># select * from user where age=1 limit 1;</span> <span class="w"> </span>name<span class="w"> </span><span class="p">|</span><span class="w"> </span>age -------+------ <span class="w"> </span>phil<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">1</span> <span class="o">(</span><span class="m">1</span><span class="w"> </span>result<span class="o">)</span> ok <span class="c1"># select * from user where age=1 limit 4;</span> <span class="w"> </span>name<span class="w"> </span><span class="p">|</span><span class="w"> </span>age -------+------ <span class="w"> </span>phil<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">1</span> <span class="o">(</span><span class="m">1</span><span class="w"> </span>result<span class="o">)</span> ok <span class="c1"># select * from user where age=2 limit 1;</span> <span class="w"> </span>name<span class="w"> </span><span class="p">|</span><span class="w"> </span>age -------+------ <span class="w"> </span>meg<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">2</span> <span class="o">(</span><span class="m">1</span><span class="w"> </span>result<span class="o">)</span> ok <span class="c1"># select * from user where age=2 limit 1 offset 1;</span> <span class="w"> </span>name<span class="w"> </span><span class="p">|</span><span class="w"> </span>age --------+------ <span class="w"> </span>jerry<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">2</span> <span class="o">(</span><span class="m">1</span><span class="w"> </span>result<span class="o">)</span> ok </pre></div> <p>Not so hard to hack is it? Make sure to include some tests!</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Working on a prototype SQL-based explorer for data stored in S3 and I needed OFFSET/LIMIT support in the gosql parser. Wrote up a short post on how you can hack in additional syntax and functionality into this SQL engine written in Go.<a href="https://t.co/PyVozTPZ5S">https://t.co/PyVozTPZ5S</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1353372050023456768?ref_src=twsrc%5Etfw">January 24, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/extending-gosql-to-support-limit-and-offset.htmlSat, 23 Jan 2021 00:00:00 +0000The year in books: 20 to recommend in 2020http://notes.eatonphil.com/year-in-books-2020.html<p>This year I finished 47 books, up from last year but not a personal best. The breakdown was 17 non-fiction and 30 fiction. Another 20-30 remain started but unfinished this year.</p> <h3 id="non-fiction">Non-fiction</h3><p>The 8 non-fiction books I most recommend are:</p> <ul> <li><a href="https://www.goodreads.com/book/show/51034048-fashionopolis">Fashionapolis: The Price of Fast Fashion and the Future of Clothes</a> (Must read)</li> <li><a href="https://www.goodreads.com/book/show/48566725-effective-python">Effective Python: 90 Specific Ways to Write Better Python</a> (Must read; truly excellent for Python programmers, I recommend this to anyone I work with)</li> <li><a href="https://www.goodreads.com/book/show/93904.The_Machine_That_Changed_the_World">The Machine that Changed the World</a> (Must read)</li> <li><a href="https://www.goodreads.com/book/show/16043511-europe">Europe: The Struggle for Supremacy from 1453 to the Present</a></li> <li><a href="https://www.goodreads.com/book/show/19606799-wind-sand-and-stars">Wind, Sand and Stars</a></li> <li><a href="https://www.goodreads.com/book/show/11169043-american-colossus">American Colussus: The Triumph of Capitalism, 1865-1900</a></li> <li><a href="https://www.goodreads.com/book/show/2360599.Making_Common_Sense_of_Japan">Making Common Sense of Japan</a></li> <li><a href="https://www.goodreads.com/book/show/8155672-the-german-genius">The German Genius</a></li> </ul> <p>The 3 books I recommend you not to waste time on are: "The Two Koreas", "The Price of Inequality", and "Ninety Percent of Everything: Inside Shipping".</p> <h4 id="the-whole-list">The whole list</h4><ul> <li><a href="https://www.goodreads.com/book/show/235560.The_Two_Koreas">The Two Koreas</a> by Don Oberdorfer<ul> <li>Interesting but not a huge fan, seemed pretty biased against South Korea somehow</li> </ul> </li> <li><a href="https://www.goodreads.com/book/show/88546.Forbidden_Nation">Forbidden Nation: A History of Taiwan</a> by Jonathan Manthorpe</li> <li><a href="https://www.goodreads.com/book/show/48566725-effective-python">Effective Python: 90 Specific Ways to Write Better Python</a> by Brett Slatkin</li> <li><a href="https://www.goodreads.com/book/show/43701534-a-philosophy-of-software-design">A Philosophy of Software Design</a> by John Ousterhout<ul> <li>Came as a recommendation from someone on Twitter, ultimately not a huge fan. Still looking for high quality books on software design</li> </ul> </li> <li><a href="https://www.goodreads.com/book/show/16031130-the-price-of-inequality">The Price of Inequality</a> by Joseph E. Stiglitz<ul> <li>Agreed with the premise but the book was incoherent and too self-assuring</li> </ul> </li> <li><a href="https://www.goodreads.com/book/show/18930203-paris-reborn">Paris Reborn: Napoléon III, Baron Haussmann, and the Quest to Build a Modern City</a> by Stephane Kirkland</li> <li><a href="https://www.goodreads.com/book/show/51034048-fashionopolis">Fashionapolis: The Price of Fast Fashion and the Future of Clothes</a> by Dana Thomas</li> <li><a href="https://www.goodreads.com/book/show/6603103-a-moveable-feast">A Moveable Feast</a> by Ernest Hemingway<ul> <li>I normally love Hemingway's writing but this particular book was not very coherent</li> </ul> </li> <li><a href="https://www.goodreads.com/book/show/16043511-europe">Europe: The Struggle for Supremacy from 1453 to the Present</a> by Brendan Simms<ul> <li>Such an excellent introduction to the continent for Americans who otherwise don't have great background</li> </ul> </li> <li><a href="https://www.goodreads.com/book/show/19606799-wind-sand-and-stars">Wind, Sand and Stars</a> by Antoine de Saint-Exupéry<ul> <li>A beautiful memoir of flights by the author of The Little Prince, very similar in style to Hemingway</li> </ul> </li> <li><a href="https://www.goodreads.com/book/show/11169043-american-colossus">American Colussus: The Triumph of Capitalism, 1865-1900</a> by H.W. Brands<ul> <li>Baby's first primer on unions, (I need more recommendations on the history of unions)</li> </ul> </li> <li><a href="https://www.goodreads.com/book/show/2360599.Making_Common_Sense_of_Japan">Making Common Sense of Japan</a> by Steven R. Reed<ul> <li>It can be difficult to find English translations of Korean, Japanese history by Korean and Japanese authors; this is a good one by an American professor</li> </ul> </li> <li><a href="https://www.goodreads.com/book/show/93904.The_Machine_That_Changed_the_World">The Machine that Changed the World</a> by James P. Womack<ul> <li>An excellent, well-researched history of automobile manufacturing in the US, Europe and Japan from the 1900s to 1990; how Japan ate everyone's lunch</li> </ul> </li> <li><a href="https://www.goodreads.com/book/show/40620.The_United_States_of_Europe">The United States of Europe</a> by T.R. Reid<ul> <li>Very light introduction to the European Union</li> </ul> </li> <li><a href="https://www.goodreads.com/book/show/7090.The_Soul_of_a_New_Machine">The Soul of a New Machine</a> by Tracy Kidder<ul> <li>Overhyped by the internets, but not bad</li> </ul> </li> <li><a href="https://www.goodreads.com/book/show/8155672-the-german-genius">The German Genius</a> by Peter Watson<ul> <li>Dense but excellent introduction to many famous Germans in many fields throughout time</li> </ul> </li> <li><a href="https://www.goodreads.com/book/show/18626537-ninety-percent-of-everything">Ninety Percent of Everything: Inside Shipping</a> by Rose George</li> </ul> <h3 id="fiction">Fiction</h3><p>I'm trying to read more from non-English authors. If you see non-English authors in the vein of these here that you can recommend, I'd love to hear from you.</p> <p>The 12 fiction books I most recommend are:</p> <ul> <li><a href="https://www.goodreads.com/book/show/11607290-planet-of-the-apes">Planet of the Apes</a> (Must read, yes even if you've seen the film)</li> <li><a href="https://www.goodreads.com/book/show/18882869-all-quiet-on-the-western-front">All Quiet on the Western Front</a> (Must read)</li> <li><a href="https://www.goodreads.com/book/show/26167126-the-mouse-that-roared">The Mouse That Roared</a> (Must read)</li> <li><a href="https://www.goodreads.com/book/show/25171354-the-dead-mountaineer-s-inn">The Dead Mountaineer's Inn</a></li> <li><a href="https://www.goodreads.com/book/show/17406654-the-golem-and-the-jinni">The Golem and the Jinni</a></li> <li><a href="https://www.goodreads.com/book/show/38886181-neverwhere">Neverwhere</a></li> <li><a href="https://www.goodreads.com/book/show/35901747-dubliners">Dubliners</a></li> <li><a href="https://www.goodreads.com/book/show/36510196-old-man-s-war">Old Man's War</a></li> <li><a href="https://www.goodreads.com/book/show/38453346-the-inspector-barlach-mysteries">The Inspector Barlach Mysteries: The Judge and His Hangman and Suspicion</a></li> <li><a href="https://www.goodreads.com/book/show/18842344-fant-mas">Fantômas</a></li> <li><a href="https://www.goodreads.com/book/show/40793127-foundation">Foundation</a></li> <li><a href="https://www.goodreads.com/book/show/13380806-out-of-the-silent-planet">Out of the Silent Planet</a></li> </ul> <p>The only book I really didn't like was "Invisible Cities".</p> <h4 id="the-whole-list">The whole list</h4><ul> <li><a href="https://www.goodreads.com/book/show/18782460-march-violets">March Violets</a> by Philip Kerr (Scottish)</li> <li><a href="https://www.goodreads.com/book/show/25299696-liberty-bar">Liberty Bar</a> by Georges Simenon (Belgian)</li> <li><a href="https://www.goodreads.com/book/show/20018218-the-late-monsieur-gallet">The Late Monsieur Gallet</a> by Georges Simenon (Belgian)</li> <li><a href="https://www.goodreads.com/book/show/35901747-dubliners">Dubliners</a> by James Joyce (Irish)</li> <li><a href="https://www.goodreads.com/book/show/11580940-tales-of-the-city">Tales of the City</a> by Amistead Maupin (American)</li> <li><a href="https://www.goodreads.com/book/show/52971537-the-third-policeman">The Third Policeman</a> by Flann O'Brien (Irish)</li> <li><a href="https://www.goodreads.com/book/show/6522120-44-scotland-street">44 Scotland Street</a> by Alexander McCall Smith (British-African)</li> <li><a href="https://www.goodreads.com/book/show/23209197-knots-and-crosses">Knots and Crosses</a> by Ian Rankin (Scottish)</li> <li><a href="https://www.goodreads.com/book/show/35598044-i-hear-your-voice">I Hear Your Voice</a> by Kim Young Ha (South Korean)</li> <li><a href="https://www.goodreads.com/book/show/17406654-the-golem-and-the-jinni">The Golem and the Jinni</a> by Helene Wecker (American)</li> <li><a href="https://www.goodreads.com/book/show/25541152-the-tokyo-zodiac-murders">The Tokyo Zodiac Murders</a> by Shimada Sōji (Japanese)</li> <li><a href="https://www.goodreads.com/book/show/8130077-the-screwtape-letters">The Screwtape Letters</a> by C.S. Lewis (English)</li> <li><a href="https://www.goodreads.com/book/show/38886181-neverwhere">Neverwhere</a> by Neil Gaiman (English)</li> <li><a href="https://www.goodreads.com/book/show/36510196-old-man-s-war">Old Man's War</a> by John Scalzi (American)</li> <li><a href="https://www.goodreads.com/book/show/9285319-tales-from-earthsea">Tales from Earthsea</a> by Ursula K. Le Guin (American)</li> <li><a href="https://www.goodreads.com/book/show/23632478-solaris">Solaris</a> by Stanisław Lem (Polish)</li> <li><a href="https://www.goodreads.com/book/show/16029682-a-wizard-of-earthsea">A Wizard of Earthsea</a> by Ursula K. Le Guin (American)</li> <li><a href="https://www.goodreads.com/book/show/11607290-planet-of-the-apes">Planet of the Apes</a> by Pierre Boulle (French)</li> <li><a href="https://www.goodreads.com/book/show/25171354-the-dead-mountaineer-s-inn">The Dead Mountaineer's Inn</a> by Arkady Strugatsky (Russian)</li> <li><a href="https://www.goodreads.com/book/show/49605492-invisible-cities">Invisible Cities</a> by Italo Calvino (Cuban-born Italian)</li> <li><a href="https://www.goodreads.com/book/show/38453346-the-inspector-barlach-mysteries">The Inspector Barlach Mysteries: The Judge and His Hangman and Suspicion</a> by Friedrich Dürrenmatt (Swiss)</li> <li><a href="https://www.goodreads.com/book/show/18842344-fant-mas">Fantômas</a> by Marcel Allain (French)</li> <li><a href="https://www.goodreads.com/book/show/18882869-all-quiet-on-the-western-front">All Quiet on the Western Front</a> by Erich Maria Remarque (Germany)</li> <li><a href="https://www.goodreads.com/book/show/22346782-a-crime-in-holland">A Crime in Holland</a> by Georges Simenon (Belgian)</li> <li><a href="https://www.goodreads.com/book/show/32076294-the-wonderful-adventure-of-nils-holgersson">The Wonderful Adventure of Nils Holversson</a> by Selma Lagerlöf (Swedish)</li> <li><a href="https://www.goodreads.com/book/show/40793127-foundation">Foundation</a> by Isaac Asimov (Russian-born American)</li> <li><a href="https://www.goodreads.com/book/show/13380806-out-of-the-silent-planet">Out of the Silent Planet</a> by C.S. Lewis (English)</li> <li><a href="https://www.goodreads.com/book/show/19847968-the-spy-who-came-in-from-the-cold">The Spy Who Came in from the Cold</a> by John le Carré (English)</li> <li><a href="https://www.goodreads.com/book/show/19792871-the-bat">The Bat</a> by Jo Nesbø (Norwegian)</li> <li><a href="https://www.goodreads.com/book/show/26167126-the-mouse-that-roared">The Mouse That Roared</a> by Leonard Wibberley (Irish-born American)</li> </ul> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Out of 47 books read this year, here&#39;s the 20 I recommend to you (gave them 4/5 stars or better). I&#39;m trying to read more non-English authors so I&#39;d love to hear if there are authors with similar style on this list you&#39;d recommend!<a href="https://t.co/FjHcvHpRSr">https://t.co/FjHcvHpRSr</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1343242325791805447?ref_src=twsrc%5Etfw">December 27, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/year-in-books-2020.htmlSun, 27 Dec 2020 00:00:00 +0000Static analysis with semgrep: practical examples using Dockerhttp://notes.eatonphil.com/static-analysis-with-semgrep.html<p>In this post we'll get a basic semgrep environment set up in Docker running some custom rules against our code.</p> <h3 id="existing-linters">Existing linters</h3><p>Linters like <a href="https://www.pylint.org/">pylint</a> for Python or <a href="https://eslint.org/">eslint</a> for JavaScript are great for general, broad language standards. But what about common nits in code review like using print statements instead of a logger, or using a defer statement inside a for loop (Go specific), or the existence of multiple nested loops.</p> <p>Most developers don't have experience working with language parsing. So it's fairly uncommon in small- and medium-sized teams to see custom linting rules. And while no single linter or language is that much more complex than the other (it's all just AST operations), there is a small penalty to learning the AST and framework for each language linter.</p> <h3 id="semgrep">Semgrep</h3><p><a href="https://semgrep.dev/">Semgrep</a> is a generic tool for finding patterns in source code. Unlike traditional regex (and traditional grep) it can find recursive patterns. This makes it especially useful as a tool to learn for finding patterns in any language.</p> <p>An advantage of semgrep rules is that you can learn the semgrep pattern matching syntax (which is surprisingly easy) and then you can write rules for any language you'd like to write rules for.</p> <p>And while the <a href="https://semgrep.dev/editor">online rule tester</a> is awesome, I had a hard time going from that to a working sample on my own laptop with Docker. We'll do just that.</p> <h3 id="catching-print-statements-in-python">Catching print statements in Python</h3><p>Let's say we want a script to fail on any use of print statements in Python:</p> <div class="highlight"><pre><span></span><span class="err">$</span> <span class="n">cat</span> <span class="n">test</span><span class="o">/</span><span class="n">python</span><span class="o">/</span><span class="n">simple</span><span class="o">-</span><span class="nb">print</span><span class="o">.</span><span class="n">py</span> <span class="k">def</span> <span class="nf">main</span><span class="p">():</span> <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;DEBUG: here&quot;</span><span class="p">)</span> <span class="nb">print</span><span class="p">(</span><span class="s2">&quot;DEBUG: &quot;</span><span class="p">,</span> <span class="s2">&quot;now here&quot;</span><span class="p">)</span> </pre></div> <p>The current <a href="https://semgrep.dev/editor">default example</a> shown in the online editor happens to be for just this. Click the Advanced tab and you'll see the following:</p> <div class="highlight"><pre><span></span><span class="nt">rules</span><span class="p">:</span> <span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">id</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">fail-on-print</span> <span class="w"> </span><span class="nt">pattern</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">|</span> <span class="w"> </span><span class="no">print(&quot;...&quot;)</span> <span class="w"> </span><span class="nt">message</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">|</span> <span class="w"> </span><span class="no">Semgrep found a match</span> <span class="w"> </span><span class="nt">severity</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">WARNING</span> </pre></div> <p>Copy this into <code>config.yml</code>. Let's modify the pattern to warn on all print calls, not just print calls with a single string argument:</p> <div class="highlight"><pre><span></span><span class="nt">rules</span><span class="p">:</span> <span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">id</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">fail-on-print</span> <span class="w"> </span><span class="nt">pattern</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">|</span> <span class="w"> </span><span class="no">print(...)</span> <span class="w"> </span><span class="nt">message</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">|</span> <span class="w"> </span><span class="no">Semgrep found a match</span> <span class="w"> </span><span class="nt">severity</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">WARNING</span> </pre></div> <p>The editor doesn't mention it (nor do any docs I can find) but we also need to include two keys in the individual rule object: <code>mode</code> and <code>languages</code>.</p> <div class="highlight"><pre><span></span><span class="nt">rules</span><span class="p">:</span> <span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">id</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">fail-on-print</span> <span class="w"> </span><span class="nt">pattern</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">|</span> <span class="w"> </span><span class="no">print(...)</span> <span class="w"> </span><span class="nt">message</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">|</span> <span class="w"> </span><span class="no">Semgrep found a match</span> <span class="w"> </span><span class="nt">severity</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">WARNING</span> <span class="w"> </span><span class="nt">mode</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">search</span> <span class="w"> </span><span class="nt">languages</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">[</span><span class="s">&quot;generic&quot;</span><span class="p p-Indicator">]</span> </pre></div> <p>Semgrep fails really weirdly if you set <code>mode</code> to anything other than <code>search</code>, but it won't warn you that what you set is garbage. The <code>languages</code> setting is similarly fickle and doesn't give you much feedback if you set it incorrectly.</p> <p class="note"> Also, I'm using the "generic" language here because I don't understand the difference between languages and as far as I'm concerned the syntax I'm using here is already pretty generic. </p><p>We run the semgrep Docker image:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>docker<span class="w"> </span>run<span class="w"> </span>-v<span class="w"> </span><span class="s2">&quot;</span><span class="si">${</span><span class="nv">PWD</span><span class="si">}</span><span class="s2">:/src&quot;</span><span class="w"> </span>returntocorp/semgrep<span class="w"> </span>--config<span class="o">=</span>config.yml<span class="w"> </span>test/python A<span class="w"> </span>new<span class="w"> </span>version<span class="w"> </span>of<span class="w"> </span>Semgrep<span class="w"> </span>is<span class="w"> </span>available.<span class="w"> </span>Please<span class="w"> </span>see<span class="w"> </span>https://github.com/returntocorp/semgrep#upgrading<span class="w"> </span><span class="k">for</span><span class="w"> </span>more<span class="w"> </span>information. running<span class="w"> </span><span class="m">1</span><span class="w"> </span>rules... test/python/simple-print.py severity:warning<span class="w"> </span>rule:fail-on-print:<span class="w"> </span>Semgrep<span class="w"> </span>found<span class="w"> </span>a<span class="w"> </span>match <span class="m">2</span>:print<span class="o">(</span><span class="s2">&quot;DEBUG: here&quot;</span><span class="o">)</span> ran<span class="w"> </span><span class="m">1</span><span class="w"> </span>rules<span class="w"> </span>on<span class="w"> </span><span class="m">1</span><span class="w"> </span>files:<span class="w"> </span><span class="m">1</span><span class="w"> </span>findings<span class="s2">&quot;&quot;</span><span class="o">)</span> </pre></div> <p>And there we've got our warning!</p> <p class="note"> Not completely clear to me why we're getting warned about a new version when we've pulled <code>latest</code> as the linked docs suggest. Maybe there's a newer version that hasn't made it into a Docker image yet. </p><h3 id="catching-fmt.print*-statements-in-go">Catching fmt.Print* statements in Go</h3><p>Let's say we also want to fail on print statements in Go (because we should use a logger instead):</p> <div class="highlight"><pre><span></span><span class="err">$</span><span class="w"> </span><span class="nx">cat</span><span class="w"> </span><span class="nx">test</span><span class="o">/</span><span class="nx">golang</span><span class="o">/</span><span class="nx">simple</span><span class="o">-</span><span class="nx">print</span><span class="p">.</span><span class="k">go</span> <span class="kn">package</span><span class="w"> </span><span class="nx">main</span> <span class="kn">import</span><span class="w"> </span><span class="s">&quot;fmt&quot;</span> <span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">a</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">&quot;here&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="nx">a</span><span class="p">)</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">&quot;%s\n&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">a</span><span class="p">)</span> <span class="w"> </span><span class="nx">e</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;My crazy error&quot;</span><span class="p">)</span> <span class="p">}</span> </pre></div> <p>We could try to look for any <code>import "fmt"</code> code in a file but that would fail on uses of <code>fmt.Sprintf</code> or <code>fmt.Errorf</code> which are fine. Instead we'll just focus on uses of <code>fmt.Printf</code> or <code>fmt.Println</code>:</p> <div class="highlight"><pre><span></span><span class="l l-Scalar l-Scalar-Plain">$ cat go-config.yml</span> <span class="l l-Scalar l-Scalar-Plain">rules</span><span class="p p-Indicator">:</span> <span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">id</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">fail-on-print</span> <span class="w"> </span><span class="nt">pattern-either</span><span class="p">:</span> <span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">pattern</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">fmt.Printf(...)</span> <span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">pattern</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">fmt.Println(...)</span> <span class="w"> </span><span class="nt">message</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">|</span> <span class="w"> </span><span class="no">Semgrep found a match</span> <span class="w"> </span><span class="nt">severity</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">WARNING</span> <span class="w"> </span><span class="nt">mode</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">search</span> <span class="w"> </span><span class="nt">languages</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">[</span><span class="s">&quot;generic&quot;</span><span class="p p-Indicator">]</span> </pre></div> <p>Run the Go config against the Go files:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>docker<span class="w"> </span>run<span class="w"> </span>-v<span class="w"> </span><span class="s2">&quot;</span><span class="si">${</span><span class="nv">PWD</span><span class="si">}</span><span class="s2">:/src&quot;</span><span class="w"> </span>returntocorp/semgrep<span class="w"> </span>--config<span class="o">=</span>go-config.yml<span class="w"> </span>test/golang A<span class="w"> </span>new<span class="w"> </span>version<span class="w"> </span>of<span class="w"> </span>Semgrep<span class="w"> </span>is<span class="w"> </span>available.<span class="w"> </span>Please<span class="w"> </span>see<span class="w"> </span>https://github.com/returntocorp/semgrep#upgrading<span class="w"> </span><span class="k">for</span><span class="w"> </span>more<span class="w"> </span>information. running<span class="w"> </span><span class="m">1</span><span class="w"> </span>rules... test/golang/simple-print.go severity:warning<span class="w"> </span>rule:fail-on-print:<span class="w"> </span>Semgrep<span class="w"> </span>found<span class="w"> </span>a<span class="w"> </span>match <span class="m">8</span>:fmt.Printf<span class="o">(</span><span class="s2">&quot;%s\n&quot;</span>,<span class="w"> </span>a<span class="o">)</span> -------------------------------------------------------------------------------- <span class="m">7</span>:fmt.Println<span class="o">(</span>a<span class="o">)</span> ran<span class="w"> </span><span class="m">1</span><span class="w"> </span>rules<span class="w"> </span>on<span class="w"> </span><span class="m">1</span><span class="w"> </span>files:<span class="w"> </span><span class="m">2</span><span class="w"> </span>findings </pre></div> <p>Cool! Making some sense. Now let's try a harder pattern.</p> <h3 id="catching-triple-nested-for-loops">Catching triple-nested for loops</h3><p>Let's try to warn on the triple-nested loop in this code:</p> <div class="highlight"><pre><span></span><span class="err">$</span><span class="w"> </span><span class="nx">cat</span><span class="w"> </span><span class="nx">test</span><span class="o">/</span><span class="nx">golang</span><span class="o">/</span><span class="nx">loopy</span><span class="p">.</span><span class="k">go</span> <span class="kn">package</span><span class="w"> </span><span class="nx">main</span> <span class="kn">import</span><span class="w"> </span><span class="s">&quot;log&quot;</span> <span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">doneFirst</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="mi">10</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Print</span><span class="p">(</span><span class="nx">i</span><span class="p">)</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">j</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="nx">j</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="mi">100</span><span class="p">;</span><span class="w"> </span><span class="nx">j</span><span class="o">++</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nx">j</span> <span class="w"> </span><span class="nx">going</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kc">true</span> <span class="w"> </span><span class="nx">k</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">going</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">k</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">break</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">k</span><span class="o">++</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Print</span><span class="p">(</span><span class="nx">k</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">doneFirst</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">true</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>If we want to catch the use of nested for loops here then we'll need to search for the loops surrounded by arbitrary syntax. Semgrep's <code>...</code> syntax makes this easy.</p> <div class="highlight"><pre><span></span><span class="l l-Scalar l-Scalar-Plain">$ cat go-config2.yml</span> <span class="l l-Scalar l-Scalar-Plain">rules</span><span class="p p-Indicator">:</span> <span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">id</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">fail-on-3-loop</span> <span class="w"> </span><span class="nt">pattern</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">|</span> <span class="w"> </span><span class="no">for ... {</span> <span class="w"> </span><span class="no">...</span> <span class="w"> </span><span class="no">for ... {</span> <span class="w"> </span><span class="no">...</span> <span class="w"> </span><span class="no">for ... {</span> <span class="w"> </span><span class="no">...</span> <span class="w"> </span><span class="no">}</span> <span class="w"> </span><span class="no">...</span> <span class="w"> </span><span class="no">}</span> <span class="w"> </span><span class="no">...</span> <span class="w"> </span><span class="no">}</span> <span class="w"> </span><span class="nt">message</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">|</span> <span class="w"> </span><span class="no">Semgrep found a match</span> <span class="w"> </span><span class="nt">severity</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">WARNING</span> <span class="w"> </span><span class="nt">mode</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">search</span> <span class="w"> </span><span class="nt">languages</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">[</span><span class="s">&quot;generic&quot;</span><span class="p p-Indicator">]</span> </pre></div> <p>And run semgrep:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>docker<span class="w"> </span>run<span class="w"> </span>-v<span class="w"> </span><span class="s2">&quot;</span><span class="si">${</span><span class="nv">PWD</span><span class="si">}</span><span class="s2">:/src&quot;</span><span class="w"> </span>returntocorp/semgrep<span class="w"> </span>--config<span class="o">=</span>go-config2.yml<span class="w"> </span>test/golang A<span class="w"> </span>new<span class="w"> </span>version<span class="w"> </span>of<span class="w"> </span>Semgrep<span class="w"> </span>is<span class="w"> </span>available.<span class="w"> </span>Please<span class="w"> </span>see<span class="w"> </span>https://github.com/returntocorp/semgrep#upgrading<span class="w"> </span><span class="k">for</span><span class="w"> </span>more<span class="w"> </span>information. running<span class="w"> </span><span class="m">1</span><span class="w"> </span>rules... test/golang/loopy.go severity:warning<span class="w"> </span>rule:fail-on-3-loop:<span class="w"> </span>Semgrep<span class="w"> </span>found<span class="w"> </span>a<span class="w"> </span>match <span class="m">7</span>:for<span class="w"> </span>i<span class="w"> </span>:<span class="o">=</span><span class="w"> </span><span class="m">0</span><span class="p">;</span><span class="w"> </span>i<span class="w"> </span>&lt;<span class="w"> </span><span class="m">10</span><span class="p">;</span><span class="w"> </span>i++<span class="w"> </span><span class="o">{</span> <span class="m">8</span>:<span class="w"> </span>log.Print<span class="o">(</span>i<span class="o">)</span> <span class="m">9</span>: <span class="m">10</span>:<span class="w"> </span><span class="k">for</span><span class="w"> </span>j<span class="w"> </span>:<span class="o">=</span><span class="w"> </span><span class="m">0</span><span class="p">;</span><span class="w"> </span>j<span class="w"> </span>&lt;<span class="w"> </span><span class="m">100</span><span class="p">;</span><span class="w"> </span>j++<span class="w"> </span><span class="o">{</span> <span class="m">11</span>:<span class="w"> </span>c<span class="w"> </span>:<span class="o">=</span><span class="w"> </span>i<span class="w"> </span>*<span class="w"> </span>j <span class="m">12</span>: <span class="m">13</span>:<span class="w"> </span>going<span class="w"> </span>:<span class="o">=</span><span class="w"> </span><span class="nb">true</span> <span class="m">14</span>:<span class="w"> </span>k<span class="w"> </span>:<span class="o">=</span><span class="w"> </span><span class="m">0</span> <span class="m">15</span>:<span class="w"> </span><span class="k">for</span><span class="w"> </span>going<span class="w"> </span><span class="o">{</span> <span class="m">16</span>:<span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nv">k</span><span class="w"> </span><span class="o">==</span><span class="w"> </span>c<span class="w"> </span><span class="o">{</span> --------<span class="w"> </span><span class="o">[</span>hid<span class="w"> </span><span class="m">10</span><span class="w"> </span>additional<span class="w"> </span>lines,<span class="w"> </span>adjust<span class="w"> </span>with<span class="w"> </span>--max-lines-per-finding<span class="o">]</span><span class="w"> </span>-------- ran<span class="w"> </span><span class="m">1</span><span class="w"> </span>rules<span class="w"> </span>on<span class="w"> </span><span class="m">2</span><span class="w"> </span>files:<span class="w"> </span><span class="m">1</span><span class="w"> </span>findings </pre></div> <p>That's just swell.</p> <h3 id="limits-of-static-analysis">Limits of static analysis</h3><p>Now let's say we refactor one of the inner loops into its own function.</p> <div class="highlight"><pre><span></span><span class="err">$</span><span class="w"> </span><span class="nx">cat</span><span class="w"> </span><span class="nx">test</span><span class="o">/</span><span class="nx">golang</span><span class="o">/</span><span class="nx">loopy</span><span class="p">.</span><span class="k">go</span> <span class="kn">package</span><span class="w"> </span><span class="nx">main</span> <span class="kn">import</span><span class="w"> </span><span class="s">&quot;log&quot;</span> <span class="kd">func</span><span class="w"> </span><span class="nx">inner</span><span class="p">(</span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">j</span><span class="w"> </span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nx">j</span> <span class="w"> </span><span class="nx">going</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kc">true</span> <span class="w"> </span><span class="nx">k</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">going</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">k</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">break</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">k</span><span class="o">++</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Print</span><span class="p">(</span><span class="nx">k</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">doneFirst</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="mi">10</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Print</span><span class="p">(</span><span class="nx">i</span><span class="p">)</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">j</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="nx">j</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="mi">100</span><span class="p">;</span><span class="w"> </span><span class="nx">j</span><span class="o">++</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">inner</span><span class="p">(</span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">j</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">doneFirst</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">true</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>And run semgrep again:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>docker<span class="w"> </span>run<span class="w"> </span>-v<span class="w"> </span><span class="s2">&quot;</span><span class="si">${</span><span class="nv">PWD</span><span class="si">}</span><span class="s2">:/src&quot;</span><span class="w"> </span>returntocorp/semgrep<span class="w"> </span>--config<span class="o">=</span>go-config2.yml<span class="w"> </span>test/golang <span class="w"> </span>A<span class="w"> </span>new<span class="w"> </span>version<span class="w"> </span>of<span class="w"> </span>Semgrep<span class="w"> </span>is<span class="w"> </span>available.<span class="w"> </span>Please<span class="w"> </span>see<span class="w"> </span>https://github.com/returntocorp/semgrep#upgrading<span class="w"> </span><span class="k">for</span><span class="w"> </span>more<span class="w"> </span>information. <span class="w"> </span>running<span class="w"> </span><span class="m">1</span><span class="w"> </span>rules... <span class="w"> </span>ran<span class="w"> </span><span class="m">1</span><span class="w"> </span>rules<span class="w"> </span>on<span class="w"> </span><span class="m">2</span><span class="w"> </span>files:<span class="w"> </span><span class="m">0</span><span class="w"> </span>findings </pre></div> <p>Well great. The 3-nested loop still exists but we can't find it anymore because it's not syntactically obvious anymore.</p> <p>At this point we'd need to start getting into linting based on runtime analysis. If you know of a tool that does this and lets you write rules like semgrep for it, please tell me!</p> <h3 id="in-summary">In summary</h3><p>In the end though, it's still very useful to be able to learn a single language for writing syntax rules at a high level to enforce behavior in code. Furthermore, a generic syntax matcher helps you write easily write rules for things that don't already have linters like YAML or JSON configuration or Vagrantfiles.</p> <p>It can be annoying to work around some missing docs in semgrep but overall it's a great tool for the kit.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr"><a href="https://twitter.com/hashtag/semgrep?src=hash&amp;ref_src=twsrc%5Etfw">#semgrep</a> is a really neat tool for syntactic analysis. Here are a few simple examples (catch print statements, triple nested loops, etc.) using Docker. Includes some necessary info the docs don&#39;t get into<a href="https://t.co/UDHEH5JmOa">https://t.co/UDHEH5JmOa</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1340785372364738562?ref_src=twsrc%5Etfw">December 20, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/static-analysis-with-semgrep.htmlSun, 20 Dec 2020 00:00:00 +0000Emulating linux/AMD64 userland: interpreting an ELF binaryhttp://notes.eatonphil.com/emulating-amd64-starting-with-elf.html<p>In this post we'll stumble toward a working emulator for a barebones C program compiled for linux/AMD64. The approach will be slightly more so based on observation than by following a spec; a great way to quickly become familiar with a topic, and a bad way to guarantee correctness.</p> <p>The goal:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>tests/simple.c int<span class="w"> </span>main<span class="o">()</span><span class="w"> </span><span class="o">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="m">4</span><span class="p">;</span> <span class="o">}</span> $<span class="w"> </span>gcc<span class="w"> </span>tests/simple.c $<span class="w"> </span>go<span class="w"> </span>build<span class="w"> </span>-o<span class="w"> </span>main $<span class="w"> </span>./main<span class="w"> </span>a.out<span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="nv">$?</span> <span class="m">4</span> </pre></div> <p>This may look ridiculously simple but when you don't know how to deal with a binary or how instructions are encoded, it will take a few hours to write an emulator that can generally handle this program!</p> <p>Code for this project is <a href="https://github.com/eatonphil/go-amd64-emulator">available on Github</a>.</p> <h3 id="background">Background</h3><p>AMD64, x86_64 or x64 are different names for AMD's widely adopted 64-bit extension to Intel's x86 instruction set (i.e. the encoding and semantics of x86 binaries). AMD64 is a superset of x86 (introducing 64-bit registers and operations) and thus backwards compatible with x86 programs.</p> <p class="note"> A year and a half ago I first got into emulation with an <a href="https://notes.eatonphil.com/emulator-basics-a-stack-and-register-machine.html">AMD64 emulator in JavaScript</a>. The JavaScript emulator interpreted the textual representation of AMD64 programs (e.g. <code>MOV RBP, RSP</code>, Intel's assembly syntax). A C program had to be compiled with <code>-S</code> to produce an assembly file that the JavaScript emulator could read (i.e. <code>gcc -S tests/simple.c</code>) This was a great way to get started with emulation by ignoring the complexity of encoded instructions and executable formats. </p><p>If we dig into the binary file produced by gcc on Linux we learn that it is an <a href="https://en.wikipedia.org/wiki/Executable_and_Linkable_Format">ELF file</a>.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>gcc<span class="w"> </span>test/simple.c $<span class="w"> </span>file<span class="w"> </span>a.out a.out:<span class="w"> </span>ELF<span class="w"> </span><span class="m">64</span>-bit<span class="w"> </span>LSB<span class="w"> </span>executable,<span class="w"> </span>x86-64,<span class="w"> </span>version<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="o">(</span>SYSV<span class="o">)</span>,<span class="w"> </span>dynamically<span class="w"> </span>linked,<span class="w"> </span>interpreter<span class="w"> </span>/lib64/ld-linux-x86-64.so.2,<span class="w"> </span>BuildID<span class="o">[</span>sha1<span class="o">]=</span>d0b5c742b9fbcbcca4dfa9438a8437a8478a51bb,<span class="w"> </span><span class="k">for</span><span class="w"> </span>GNU/Linux<span class="w"> </span><span class="m">3</span>.2.0,<span class="w"> </span>not<span class="w"> </span>stripped </pre></div> <p>ELF is responsible for surrounding the actual binary-encoded program instructions with metadata on exported and imported C identifiers and program entrypoint. But for simple programs like this initial emulator, we can ignore export/imports. We'll only use the ELF metadata to find out where the instructions for our <code>main</code> function start.</p> <h3 id="where-is-main?">Where is main?</h3><p>If we use an ELF reader+disassembler on the binary generated by gcc and search for <code>main</code> we can find its address.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>objdump<span class="w"> </span>-d<span class="w"> </span>a.out<span class="w"> </span><span class="p">|</span><span class="w"> </span>grep<span class="w"> </span>-A10<span class="w"> </span><span class="s1">&#39;&lt;main&gt;&#39;</span> <span class="m">0000000000401106</span><span class="w"> </span>&lt;main&gt;: <span class="w"> </span><span class="m">401106</span>:<span class="w"> </span><span class="m">55</span><span class="w"> </span>push<span class="w"> </span>%rbp <span class="w"> </span><span class="m">401107</span>:<span class="w"> </span><span class="m">48</span><span class="w"> </span><span class="m">89</span><span class="w"> </span>e5<span class="w"> </span>mov<span class="w"> </span>%rsp,%rbp <span class="w"> </span>40110a:<span class="w"> </span>b8<span class="w"> </span>fe<span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span>mov<span class="w"> </span><span class="nv">$0</span>xfe,%eax <span class="w"> </span>40110f:<span class="w"> </span>5d<span class="w"> </span>pop<span class="w"> </span>%rbp <span class="w"> </span><span class="m">401110</span>:<span class="w"> </span>c3<span class="w"> </span>retq <span class="w"> </span><span class="m">401111</span>:<span class="w"> </span><span class="m">66</span><span class="w"> </span>2e<span class="w"> </span>0f<span class="w"> </span>1f<span class="w"> </span><span class="m">84</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span>nopw<span class="w"> </span>%cs:0x0<span class="o">(</span>%rax,%rax,1<span class="o">)</span> <span class="w"> </span><span class="m">401118</span>:<span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span> <span class="w"> </span>40111b:<span class="w"> </span>0f<span class="w"> </span>1f<span class="w"> </span><span class="m">44</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span>nopl<span class="w"> </span>0x0<span class="o">(</span>%rax,%rax,1<span class="o">)</span> <span class="m">0000000000401120</span><span class="w"> </span>&lt;__libc_csu_init&gt;: </pre></div> <p>This means that the function, <code>main</code>, starts at address <code>0x401106</code> in memory. Furthermore, this implies that the binary must be loaded into CPU memory such that the CPU can jump here to execute our program.</p> <p>In truth, <code>main</code> is not this program's entrypoint. If we run <code>objdump -x a.out</code> we can see that the ELF entrypoint is <code>0x401020</code>.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>objdump<span class="w"> </span>-x<span class="w"> </span>a.out a.out:<span class="w"> </span>file<span class="w"> </span>format<span class="w"> </span>elf64-x86-64 a.out architecture:<span class="w"> </span>i386:x86-64,<span class="w"> </span>flags<span class="w"> </span>0x00000112: EXEC_P,<span class="w"> </span>HAS_SYMS,<span class="w"> </span>D_PAGED start<span class="w"> </span>address<span class="w"> </span>0x0000000000401020 Program<span class="w"> </span>Header: <span class="w"> </span>PHDR<span class="w"> </span>off<span class="w"> </span>0x0000000000000040<span class="w"> </span>vaddr<span class="w"> </span>0x0000000000400040<span class="w"> </span>paddr<span class="w"> </span>0x0000000000400040<span class="w"> </span>align<span class="w"> </span><span class="m">2</span>**3 <span class="w"> </span>filesz<span class="w"> </span>0x00000000000002d8<span class="w"> </span>memsz<span class="w"> </span>0x00000000000002d8<span class="w"> </span>flags<span class="w"> </span>r-- </pre></div> <p>This is because the actual entrypoint gcc sets up is a function called <code>_start</code>. The libc prelude beginning with <code>_start</code> is responsible for initializing the libc runtime, calling our <code>main</code> function and executing the exit syscall with the return value of <code>main</code>.</p> <div class="highlight"><pre><span></span>objdump<span class="w"> </span>-d<span class="w"> </span>a.out<span class="w"> </span><span class="p">|</span><span class="w"> </span>grep<span class="w"> </span>-A10<span class="w"> </span><span class="s1">&#39;&lt;_start&gt;&#39;</span> <span class="m">0000000000401020</span><span class="w"> </span>&lt;_start&gt;: <span class="w"> </span><span class="m">401020</span>:<span class="w"> </span>f3<span class="w"> </span>0f<span class="w"> </span>1e<span class="w"> </span>fa<span class="w"> </span>endbr64 <span class="w"> </span><span class="m">401024</span>:<span class="w"> </span><span class="m">31</span><span class="w"> </span>ed<span class="w"> </span>xor<span class="w"> </span>%ebp,%ebp <span class="w"> </span><span class="m">401026</span>:<span class="w"> </span><span class="m">49</span><span class="w"> </span><span class="m">89</span><span class="w"> </span>d1<span class="w"> </span>mov<span class="w"> </span>%rdx,%r9 <span class="w"> </span><span class="m">401029</span>:<span class="w"> </span>5e<span class="w"> </span>pop<span class="w"> </span>%rsi <span class="w"> </span>40102a:<span class="w"> </span><span class="m">48</span><span class="w"> </span><span class="m">89</span><span class="w"> </span>e2<span class="w"> </span>mov<span class="w"> </span>%rsp,%rdx <span class="w"> </span>40102d:<span class="w"> </span><span class="m">48</span><span class="w"> </span><span class="m">83</span><span class="w"> </span>e4<span class="w"> </span>f0<span class="w"> </span>and<span class="w"> </span><span class="nv">$0</span>xfffffffffffffff0,%rsp <span class="w"> </span><span class="m">401031</span>:<span class="w"> </span><span class="m">50</span><span class="w"> </span>push<span class="w"> </span>%rax <span class="w"> </span><span class="m">401032</span>:<span class="w"> </span><span class="m">54</span><span class="w"> </span>push<span class="w"> </span>%rsp <span class="w"> </span><span class="m">401033</span>:<span class="w"> </span><span class="m">49</span><span class="w"> </span>c7<span class="w"> </span>c0<span class="w"> </span><span class="m">90</span><span class="w"> </span><span class="m">11</span><span class="w"> </span><span class="m">40</span><span class="w"> </span><span class="m">00</span><span class="w"> </span>mov<span class="w"> </span><span class="nv">$0</span>x401190,%r8 <span class="w"> </span>40103a:<span class="w"> </span><span class="m">48</span><span class="w"> </span>c7<span class="w"> </span>c1<span class="w"> </span><span class="m">20</span><span class="w"> </span><span class="m">11</span><span class="w"> </span><span class="m">40</span><span class="w"> </span><span class="m">00</span><span class="w"> </span>mov<span class="w"> </span><span class="nv">$0</span>x401120,%rcx </pre></div> <p>But because all this libc initialization is relatively complicated we're just going to skip the actual ELF entrypoint for now. Our emulator will locate <code>main</code>, load the binary into memory, jump to the start of <code>main</code>, and set the exit code of the emulator to the result of main.</p> <p class="note"> As you can see, this ELF binary has its own hard-coded view of where it will be in memory. What if our CPU were to run multiple process at once? We might give each process its own virtual memory space and map back to a real memory space so each process (and by extension, compilers) doesn't have to think about how they fit into memory relative to other processes. </p><p>The last question to figure out is where to load the ELF binary into emulator memory so that addresses in memory are where the program expects.</p> <p>As it turns out, there is a piece of metadata called section headers that contain an address and a offset from the start of the ELF file. By subtracting this we can get the location the file expects to be in memory.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>objdump<span class="w"> </span>-x<span class="w"> </span>a.out a.out:<span class="w"> </span>file<span class="w"> </span>format<span class="w"> </span>elf64-x86-64 a.out architecture:<span class="w"> </span>i386:x86-64,<span class="w"> </span>flags<span class="w"> </span>0x00000112: EXEC_P,<span class="w"> </span>HAS_SYMS,<span class="w"> </span>D_PAGED start<span class="w"> </span>address<span class="w"> </span>0x0000000000401020 Program<span class="w"> </span>Header: <span class="w"> </span>PHDR<span class="w"> </span>off<span class="w"> </span>0x0000000000000040<span class="w"> </span>vaddr<span class="w"> </span>0x0000000000400040<span class="w"> </span>paddr<span class="w"> </span>0x0000000000400040<span class="w"> </span>align<span class="w"> </span><span class="m">2</span>**3 <span class="w"> </span>filesz<span class="w"> </span>0x00000000000002d8<span class="w"> </span>memsz<span class="w"> </span>0x00000000000002d8<span class="w"> </span>flags<span class="w"> </span>r-- </pre></div> <p>That is: <code>0x400040 (vaddr) - 0x40 (off) = 0x400000</code>. Judging from a Google search this seems to be a pretty common address where ELF binaries are loaded into memory.</p> <h3 id="elf-and-go">ELF and Go</h3><p>Binary file formats tend to be a pain to work with because, to enable greater compression, everything ends up being a pointer to something else. So you end up jumping all around the file just to stitch information back together.</p> <p>So the one third-party-ish library we'll use is Go's builtin <code>debug/elf</code> package. With this library we can load an ELF binary and iterate over symbols and sections to discover the location of <code>main</code> and the start address for the binary in memory.</p> <p>Editing in <code>main.go</code>:</p> <div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span> <span class="kn">import</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="s">&quot;bytes&quot;</span> <span class="w"> </span><span class="s">&quot;debug/elf&quot;</span> <span class="w"> </span><span class="s">&quot;fmt&quot;</span> <span class="w"> </span><span class="s">&quot;io/ioutil&quot;</span> <span class="w"> </span><span class="s">&quot;log&quot;</span> <span class="w"> </span><span class="s">&quot;os&quot;</span> <span class="p">)</span> <span class="kd">type</span><span class="w"> </span><span class="nx">process</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">startAddress</span><span class="w"> </span><span class="kt">uint64</span> <span class="w"> </span><span class="nx">entryPoint</span><span class="w"> </span><span class="kt">uint64</span> <span class="w"> </span><span class="nx">bin</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">readELF</span><span class="p">(</span><span class="nx">filename</span><span class="p">,</span><span class="w"> </span><span class="nx">entrySymbol</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">process</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">bin</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">ioutil</span><span class="p">.</span><span class="nx">ReadFile</span><span class="p">(</span><span class="nx">filename</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">elffile</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">elf</span><span class="p">.</span><span class="nx">NewFile</span><span class="p">(</span><span class="nx">bytes</span><span class="p">.</span><span class="nx">NewReader</span><span class="p">(</span><span class="nx">bin</span><span class="p">))</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">symbols</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">elffile</span><span class="p">.</span><span class="nx">Symbols</span><span class="p">()</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">entryPoint</span><span class="w"> </span><span class="kt">uint64</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">sym</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">symbols</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">sym</span><span class="p">.</span><span class="nx">Name</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">entrySymbol</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">elf</span><span class="p">.</span><span class="nx">STT_FUNC</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">elf</span><span class="p">.</span><span class="nx">ST_TYPE</span><span class="p">(</span><span class="nx">sym</span><span class="p">.</span><span class="nx">Info</span><span class="p">)</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">elf</span><span class="p">.</span><span class="nx">STB_GLOBAL</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">elf</span><span class="p">.</span><span class="nx">ST_BIND</span><span class="p">(</span><span class="nx">sym</span><span class="p">.</span><span class="nx">Info</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">entryPoint</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">sym</span><span class="p">.</span><span class="nx">Value</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">entryPoint</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Could not find entrypoint symbol: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">entrySymbol</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">startAddress</span><span class="w"> </span><span class="kt">uint64</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">sec</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">elffile</span><span class="p">.</span><span class="nx">Sections</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">sec</span><span class="p">.</span><span class="nx">Type</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">elf</span><span class="p">.</span><span class="nx">SHT_NULL</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">startAddress</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">sec</span><span class="p">.</span><span class="nx">Addr</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nx">sec</span><span class="p">.</span><span class="nx">Offset</span> <span class="w"> </span><span class="k">break</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">startAddress</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Could not determine start address&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">process</span><span class="p">{</span> <span class="w"> </span><span class="nx">startAddress</span><span class="p">:</span><span class="w"> </span><span class="nx">startAddress</span><span class="p">,</span> <span class="w"> </span><span class="nx">entryPoint</span><span class="p">:</span><span class="w"> </span><span class="nx">entryPoint</span><span class="p">,</span> <span class="w"> </span><span class="nx">bin</span><span class="p">:</span><span class="w"> </span><span class="nx">bin</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">)</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="s">&quot;Binary not provided&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">proc</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">readELF</span><span class="p">(</span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="w"> </span><span class="s">&quot;main&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">&quot;Start: 0x%x\nEntry: 0x%x\n&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">proc</span><span class="p">.</span><span class="nx">startAddress</span><span class="p">,</span><span class="w"> </span><span class="nx">proc</span><span class="p">.</span><span class="nx">entryPoint</span><span class="p">)</span> <span class="p">}</span> </pre></div> <p>We can test on a basic compiled C program:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>tests/simple.c int<span class="w"> </span>main<span class="o">()</span><span class="w"> </span><span class="o">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="m">4</span><span class="p">;</span> <span class="o">}</span> $<span class="w"> </span>gcc<span class="w"> </span>tests/simple.c $<span class="w"> </span>go<span class="w"> </span>build<span class="w"> </span>-o<span class="w"> </span>main $<span class="w"> </span>./main<span class="w"> </span>a.out Start:<span class="w"> </span>0x400000 Entry:<span class="w"> </span>0x401106 </pre></div> <p>And verify against <code>objdump</code>:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>objdump<span class="w"> </span>-d<span class="w"> </span>a.out<span class="w"> </span><span class="p">|</span><span class="w"> </span>grep<span class="w"> </span>-A10<span class="w"> </span><span class="s1">&#39;&lt;main&gt;&#39;</span> <span class="m">0000000000401106</span><span class="w"> </span>&lt;main&gt;: <span class="w"> </span><span class="m">401106</span>:<span class="w"> </span><span class="m">55</span><span class="w"> </span>push<span class="w"> </span>%rbp<span class="s1">&#39;&gt;&#39;</span> </pre></div> <p>And that's it for dealing with ELF. Now we can sketch out a virtual CPU and how we deal with interpreting instructions starting at this address.</p> <h3 id="the-cpu">The CPU</h3><p>AMD64 counts on being able to store values in registers and memory, sometimes through direct addressing and sometimes indirectly using stack operations (push and pop). And userland processes count on being loaded into CPU memory so the CPU can jump to the process entrypoint and process.</p> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">cpu</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">proc</span><span class="w"> </span><span class="o">*</span><span class="nx">process</span> <span class="w"> </span><span class="nx">mem</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span> <span class="w"> </span><span class="nx">regfile</span><span class="w"> </span><span class="o">*</span><span class="nx">registerFile</span> <span class="w"> </span><span class="nx">tick</span><span class="w"> </span><span class="kd">chan</span><span class="w"> </span><span class="kt">bool</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">newCPU</span><span class="p">(</span><span class="nx">memory</span><span class="w"> </span><span class="kt">uint64</span><span class="p">)</span><span class="w"> </span><span class="nx">cpu</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">cpu</span><span class="p">{</span> <span class="w"> </span><span class="nx">mem</span><span class="p">:</span><span class="w"> </span><span class="nb">make</span><span class="p">([]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="nx">memory</span><span class="p">),</span> <span class="w"> </span><span class="nx">regfile</span><span class="p">:</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">registerFile</span><span class="p">{},</span> <span class="w"> </span><span class="nx">tick</span><span class="p">:</span><span class="w"> </span><span class="nb">make</span><span class="p">(</span><span class="kd">chan</span><span class="w"> </span><span class="kt">bool</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">),</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>The <code>tick</code> channel is so that later on we can wrap the emulator in a terminal debugger. But by default we'll just set up a goroutine to tick forever.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">repl</span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">cpu</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// TODO</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">)</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatal</span><span class="p">(</span><span class="s">&quot;Binary not provided&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">proc</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">readELF</span><span class="p">(</span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="w"> </span><span class="s">&quot;main&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">debug</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">arg</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">arg</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s">&quot;--debug&quot;</span><span class="p">:</span> <span class="w"> </span><span class="k">fallthrough</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s">&quot;-d&quot;</span><span class="p">:</span> <span class="w"> </span><span class="nx">debug</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">true</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// 10 MB</span> <span class="w"> </span><span class="nx">cpu</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">newCPU</span><span class="p">(</span><span class="mh">0x400000</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">10</span><span class="p">)</span> <span class="w"> </span><span class="k">go</span><span class="w"> </span><span class="nx">cpu</span><span class="p">.</span><span class="nx">run</span><span class="p">(</span><span class="nx">proc</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">debug</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">repl</span><span class="p">(</span><span class="o">&amp;</span><span class="nx">cpu</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">cpu</span><span class="p">.</span><span class="nx">tick</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="kc">true</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <h3 id="registers">Registers</h3><p>To emulate a simple program like our <code>tests/simple.c</code>, we'll only need to support a few common registers. The order is important so that we can use the Go identifiers when we want to refer to the <a href="https://wiki.osdev.org/X86-64_Instruction_Encoding#Registers">encoded integer value of the register</a>.</p> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">register</span><span class="w"> </span><span class="kt">int</span> <span class="kd">const</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="c1">// These are in order of encoding value (i.e. rbp is 5)</span> <span class="w"> </span><span class="nx">rax</span><span class="w"> </span><span class="nx">register</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">iota</span> <span class="w"> </span><span class="nx">rcx</span> <span class="w"> </span><span class="nx">rdx</span> <span class="w"> </span><span class="nx">rbx</span> <span class="w"> </span><span class="nx">rsp</span> <span class="w"> </span><span class="nx">rbp</span> <span class="w"> </span><span class="nx">rsi</span> <span class="w"> </span><span class="nx">rdi</span> <span class="w"> </span><span class="nx">r8</span> <span class="w"> </span><span class="nx">r9</span> <span class="w"> </span><span class="nx">r10</span> <span class="w"> </span><span class="nx">r11</span> <span class="w"> </span><span class="nx">r12</span> <span class="w"> </span><span class="nx">r13</span> <span class="w"> </span><span class="nx">r14</span> <span class="w"> </span><span class="nx">r15</span> <span class="w"> </span><span class="nx">rip</span> <span class="w"> </span><span class="nx">rflags</span> <span class="p">)</span> <span class="kd">var</span><span class="w"> </span><span class="nx">registerMap</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="nx">register</span><span class="p">]</span><span class="kt">string</span><span class="p">{</span> <span class="w"> </span><span class="nx">rax</span><span class="p">:</span><span class="w"> </span><span class="s">&quot;rax&quot;</span><span class="p">,</span> <span class="w"> </span><span class="nx">rcx</span><span class="p">:</span><span class="w"> </span><span class="s">&quot;rcx&quot;</span><span class="p">,</span> <span class="w"> </span><span class="nx">rdx</span><span class="p">:</span><span class="w"> </span><span class="s">&quot;rdx&quot;</span><span class="p">,</span> <span class="w"> </span><span class="nx">rbx</span><span class="p">:</span><span class="w"> </span><span class="s">&quot;rbx&quot;</span><span class="p">,</span> <span class="w"> </span><span class="nx">rsp</span><span class="p">:</span><span class="w"> </span><span class="s">&quot;rsp&quot;</span><span class="p">,</span> <span class="w"> </span><span class="nx">rbp</span><span class="p">:</span><span class="w"> </span><span class="s">&quot;rbp&quot;</span><span class="p">,</span> <span class="w"> </span><span class="nx">rsi</span><span class="p">:</span><span class="w"> </span><span class="s">&quot;rsi&quot;</span><span class="p">,</span> <span class="w"> </span><span class="nx">rdi</span><span class="p">:</span><span class="w"> </span><span class="s">&quot;rdi&quot;</span><span class="p">,</span> <span class="w"> </span><span class="nx">r8</span><span class="p">:</span><span class="w"> </span><span class="s">&quot;r8&quot;</span><span class="p">,</span> <span class="w"> </span><span class="nx">r9</span><span class="p">:</span><span class="w"> </span><span class="s">&quot;r9&quot;</span><span class="p">,</span> <span class="w"> </span><span class="nx">r10</span><span class="p">:</span><span class="w"> </span><span class="s">&quot;r10&quot;</span><span class="p">,</span> <span class="w"> </span><span class="nx">r11</span><span class="p">:</span><span class="w"> </span><span class="s">&quot;r11&quot;</span><span class="p">,</span> <span class="w"> </span><span class="nx">r12</span><span class="p">:</span><span class="w"> </span><span class="s">&quot;r12&quot;</span><span class="p">,</span> <span class="w"> </span><span class="nx">r13</span><span class="p">:</span><span class="w"> </span><span class="s">&quot;r13&quot;</span><span class="p">,</span> <span class="w"> </span><span class="nx">r14</span><span class="p">:</span><span class="w"> </span><span class="s">&quot;r14&quot;</span><span class="p">,</span> <span class="w"> </span><span class="nx">r15</span><span class="p">:</span><span class="w"> </span><span class="s">&quot;r15&quot;</span><span class="p">,</span> <span class="w"> </span><span class="nx">rip</span><span class="p">:</span><span class="w"> </span><span class="s">&quot;rip&quot;</span><span class="p">,</span> <span class="w"> </span><span class="nx">rflags</span><span class="p">:</span><span class="w"> </span><span class="s">&quot;rflags&quot;</span><span class="p">,</span> <span class="p">}</span> <span class="kd">type</span><span class="w"> </span><span class="nx">registerFile</span><span class="w"> </span><span class="p">[</span><span class="mi">18</span><span class="p">]</span><span class="kt">uint64</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">regfile</span><span class="w"> </span><span class="o">*</span><span class="nx">registerFile</span><span class="p">)</span><span class="w"> </span><span class="nx">get</span><span class="p">(</span><span class="nx">r</span><span class="w"> </span><span class="nx">register</span><span class="p">)</span><span class="w"> </span><span class="kt">uint64</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">regfile</span><span class="p">[</span><span class="nx">r</span><span class="p">]</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">regfile</span><span class="w"> </span><span class="o">*</span><span class="nx">registerFile</span><span class="p">)</span><span class="w"> </span><span class="nx">set</span><span class="p">(</span><span class="nx">r</span><span class="w"> </span><span class="nx">register</span><span class="p">,</span><span class="w"> </span><span class="nx">v</span><span class="w"> </span><span class="kt">uint64</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">regfile</span><span class="p">[</span><span class="nx">r</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">v</span> <span class="p">}</span> </pre></div> <p>Of immediate importance will be <code>rip</code>, <code>rsp</code>, and <code>rax</code> registers. <code>rip</code> is used to track the current instruction to process. It will generally be incremented except for when dealing with function calls and returns. <code>rsp</code> is used as a pointer to the top of a stack in memory. It is incremented and decremented as values are pushed and popped on this stack. Finally, <code>rax</code> is used to pass function return values.</p> <h3 id="loading-a-program">Loading a program</h3><p>Running a program is a matter of loading the program into memory, setting the stack pointer to the last address of memory (in x86 the stack grows down), pointing <code>rip</code> at the entrypoint, and looping until the entrypoint function returns.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">writeBytes</span><span class="p">(</span><span class="nx">to</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="nx">start</span><span class="w"> </span><span class="kt">uint64</span><span class="p">,</span><span class="w"> </span><span class="nx">bytes</span><span class="w"> </span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="nx">val</span><span class="w"> </span><span class="kt">uint64</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nx">bytes</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">to</span><span class="p">[</span><span class="nx">start</span><span class="o">+</span><span class="nb">uint64</span><span class="p">(</span><span class="nx">i</span><span class="p">)]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">byte</span><span class="p">(</span><span class="nx">val</span><span class="w"> </span><span class="o">&gt;&gt;</span><span class="w"> </span><span class="p">(</span><span class="mi">8</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="o">&amp;</span><span class="w"> </span><span class="mh">0xFF</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">cpu</span><span class="p">)</span><span class="w"> </span><span class="nx">loop</span><span class="p">(</span><span class="nx">entryReturnAddress</span><span class="w"> </span><span class="kt">uint64</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="o">&lt;-</span><span class="nx">c</span><span class="p">.</span><span class="nx">tick</span> <span class="w"> </span><span class="nx">ip</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="nx">rip</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">ip</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">entryReturnAddress</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">break</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">inb1</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">mem</span><span class="p">[</span><span class="nx">ip</span><span class="p">]</span> <span class="w"> </span><span class="c1">// TODO: deal with instructions</span> <span class="w"> </span><span class="c1">// move to next instruction</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">set</span><span class="p">(</span><span class="nx">rip</span><span class="p">,</span><span class="w"> </span><span class="nx">ip</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">cpu</span><span class="p">)</span><span class="w"> </span><span class="nx">run</span><span class="p">(</span><span class="nx">proc</span><span class="w"> </span><span class="o">*</span><span class="nx">process</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">copy</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">mem</span><span class="p">[</span><span class="nx">proc</span><span class="p">.</span><span class="nx">startAddress</span><span class="p">:</span><span class="nx">proc</span><span class="p">.</span><span class="nx">startAddress</span><span class="o">+</span><span class="nb">uint64</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">proc</span><span class="p">.</span><span class="nx">bin</span><span class="p">))],</span><span class="w"> </span><span class="nx">proc</span><span class="p">.</span><span class="nx">bin</span><span class="p">)</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">set</span><span class="p">(</span><span class="nx">rip</span><span class="p">,</span><span class="w"> </span><span class="nx">proc</span><span class="p">.</span><span class="nx">entryPoint</span><span class="p">)</span> <span class="w"> </span><span class="nx">initialStackPointer</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">mem</span><span class="p">)</span><span class="o">-</span><span class="mi">8</span><span class="p">)</span> <span class="w"> </span><span class="nx">writeBytes</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">mem</span><span class="p">,</span><span class="w"> </span><span class="nx">initialStackPointer</span><span class="p">,</span><span class="w"> </span><span class="mi">8</span><span class="p">,</span><span class="w"> </span><span class="nx">initialStackPointer</span><span class="p">)</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">set</span><span class="p">(</span><span class="nx">rsp</span><span class="p">,</span><span class="w"> </span><span class="nx">initialStackPointer</span><span class="p">)</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">loop</span><span class="p">(</span><span class="nx">initialStackPointer</span><span class="p">)</span> <span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Exit</span><span class="p">(</span><span class="nb">int</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="nx">rax</span><span class="p">)))</span> <span class="p">}</span> </pre></div> <p>We write the initial stack pointer address into the stack so that when the program final returns, it will return to this address at which pointer we can exit the program.</p> <p>And now we're ready to start interpreting instructions.</p> <h3 id="instruction-decoding">Instruction decoding</h3><p>Using <code>objdump</code> we get a sense for what the program decodes to.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>objdump<span class="w"> </span>-d<span class="w"> </span>a.out<span class="w"> </span><span class="p">|</span><span class="w"> </span>grep<span class="w"> </span>-A10<span class="w"> </span><span class="s1">&#39;&lt;main&gt;&#39;</span> <span class="m">0000000000401106</span><span class="w"> </span>&lt;main&gt;: <span class="w"> </span><span class="m">401106</span>:<span class="w"> </span><span class="m">55</span><span class="w"> </span>push<span class="w"> </span>%rbp <span class="w"> </span><span class="m">401107</span>:<span class="w"> </span><span class="m">48</span><span class="w"> </span><span class="m">89</span><span class="w"> </span>e5<span class="w"> </span>mov<span class="w"> </span>%rsp,%rbp <span class="w"> </span>40110a:<span class="w"> </span>b8<span class="w"> </span>fe<span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span>mov<span class="w"> </span><span class="nv">$0</span>xfe,%eax <span class="w"> </span>40110f:<span class="w"> </span>5d<span class="w"> </span>pop<span class="w"> </span>%rbp <span class="w"> </span><span class="m">401110</span>:<span class="w"> </span>c3<span class="w"> </span>retq <span class="w"> </span><span class="m">401111</span>:<span class="w"> </span><span class="m">66</span><span class="w"> </span>2e<span class="w"> </span>0f<span class="w"> </span>1f<span class="w"> </span><span class="m">84</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span>nopw<span class="w"> </span>%cs:0x0<span class="o">(</span>%rax,%rax,1<span class="o">)</span> <span class="w"> </span><span class="m">401118</span>:<span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span> <span class="w"> </span>40111b:<span class="w"> </span>0f<span class="w"> </span>1f<span class="w"> </span><span class="m">44</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span>nopl<span class="w"> </span>0x0<span class="o">(</span>%rax,%rax,1<span class="o">)</span> <span class="m">0000000000401120</span><span class="w"> </span>&lt;__libc_csu_init&gt;: </pre></div> <p>We see that <code>0x55</code> means <code>push %rbp</code>. And we also see that instructions aren't a fixed number of bytes. Some are one byte, some are seven. Some (not shown) are <a href="https://stackoverflow.com/questions/14698350/x86-64-asm-maximum-bytes-for-an-instruction">even longer</a>.</p> <p>Thankfully instructions follow some fairly simple patterns. There are a set of prefix instructions and a set of real instructions. So far we should be able to tell on the first byte whether the instruction is a prefix instruction and, if not, how many bytes the instruction will take up on the whole.</p> <h4 id="push">push</h4><p>To support a new instruction, we'll look up <code>0x55</code> in an opcode table like <a href="http://ref.x86asm.net/coder64.html">this</a>. Clicking on <a href="http://ref.x86asm.net/coder64.html#x50">55</a> in the opcode index we see that this is indeed a push instruction. <code>50+r</code> means that we have to subtract <code>0x50</code> from the opcode to determine the register we should push.</p> <p>The register will be <code>0x55 - 0x50 = 5</code> which if we look up in a <a href="https://wiki.osdev.org/X86-64_Instruction_Encoding#Registers">register table</a> is <code>rbp</code>. Since we set up our register enum in code in this order, we'll be able to just use the constant <code>rbp</code> in Go code.</p> <p>Finally, since the next instruction numerically is <code>0x58</code> we know that this instruction is identified by being between <code>0x50</code> and <code>0x57</code> inclusive. This is all the info we need to handle this instruction.</p> <div class="highlight"><pre><span></span><span class="c1">// helper for dumping byte arrays as hex</span> <span class="kd">func</span><span class="w"> </span><span class="nx">hbdebug</span><span class="p">(</span><span class="nx">msg</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">bs</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">str</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="s">&quot;%s:&quot;</span> <span class="w"> </span><span class="nx">args</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="kd">interface</span><span class="p">{}{</span><span class="nx">msg</span><span class="p">}</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">bs</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">str</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">str</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s">&quot; %x&quot;</span> <span class="w"> </span><span class="nx">args</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">args</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="nx">str</span><span class="o">+</span><span class="s">&quot;\n&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="o">...</span><span class="p">)</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">cpu</span><span class="p">)</span><span class="w"> </span><span class="nx">loop</span><span class="p">(</span><span class="nx">entryReturnAddress</span><span class="w"> </span><span class="kt">uint64</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="o">&lt;-</span><span class="nx">c</span><span class="p">.</span><span class="nx">tick</span> <span class="w"> </span><span class="nx">ip</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="nx">rip</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">ip</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">entryReturnAddress</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">break</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">inb1</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">mem</span><span class="p">[</span><span class="nx">ip</span><span class="p">]</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">inb1</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="mh">0x50</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">inb1</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="mh">0x58</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="c1">// push</span> <span class="w"> </span><span class="nx">regvalue</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="nx">register</span><span class="p">(</span><span class="nx">inb1</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mh">0x50</span><span class="p">))</span> <span class="w"> </span><span class="nx">sp</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="nx">rsp</span><span class="p">)</span> <span class="w"> </span><span class="nx">writeBytes</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">mem</span><span class="p">,</span><span class="w"> </span><span class="nx">sp</span><span class="o">-</span><span class="mi">8</span><span class="p">,</span><span class="w"> </span><span class="mi">8</span><span class="p">,</span><span class="w"> </span><span class="nx">regvalue</span><span class="p">)</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">set</span><span class="p">(</span><span class="nx">rsp</span><span class="p">,</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nx">sp</span><span class="o">-</span><span class="mi">8</span><span class="p">))</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">hbdebug</span><span class="p">(</span><span class="s">&quot;prog&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">mem</span><span class="p">[</span><span class="nx">ip</span><span class="p">:</span><span class="nx">ip</span><span class="o">+</span><span class="mi">10</span><span class="p">])</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="s">&quot;Unknown instruction&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">set</span><span class="p">(</span><span class="nx">rip</span><span class="p">,</span><span class="w"> </span><span class="nx">ip</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>If we try this out now we should expect it to panic on the second byte, <code>0x48</code>.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>build<span class="w"> </span>-o<span class="w"> </span>main $<span class="w"> </span>./main<span class="w"> </span>a.out prog:<span class="w"> </span><span class="m">48</span><span class="w"> </span><span class="m">89</span><span class="w"> </span>e5<span class="w"> </span>b8<span class="w"> </span><span class="m">4</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span><span class="w"> </span>5d<span class="w"> </span>c3 panic:<span class="w"> </span>Unknown<span class="w"> </span>instruction goroutine<span class="w"> </span><span class="m">19</span><span class="w"> </span><span class="o">[</span>running<span class="o">]</span>: main.<span class="o">(</span>*cpu<span class="o">)</span>.loop<span class="o">(</span>0xc000086c30,<span class="w"> </span>0x2800000<span class="o">)</span> <span class="w"> </span>/home/phil/tmp/goamd/main.go:168<span class="w"> </span>+0x16d main.<span class="o">(</span>*cpu<span class="o">)</span>.run<span class="o">(</span>0xc000086c30,<span class="w"> </span>0xc000086c00<span class="o">)</span> <span class="w"> </span>/home/phil/tmp/goamd/main.go:180<span class="w"> </span>+0xac created<span class="w"> </span>by<span class="w"> </span>main.main <span class="w"> </span>/home/phil/tmp/goamd/main.go:211<span class="w"> </span>+0x286 </pre></div> <p>Looking good.</p> <h4 id="mov">mov</h4><p>Taking a look at the next two instructions with <code>objdump</code> we see <code>mov</code> encoded two different ways.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>objdump<span class="w"> </span>-d<span class="w"> </span>a.out<span class="w"> </span><span class="p">|</span><span class="w"> </span>grep<span class="w"> </span>-A4<span class="w"> </span><span class="s1">&#39;&lt;main&gt;&#39;</span> <span class="m">0000000000401106</span><span class="w"> </span>&lt;main&gt;: <span class="w"> </span><span class="m">401106</span>:<span class="w"> </span><span class="m">55</span><span class="w"> </span>push<span class="w"> </span>%rbp <span class="w"> </span><span class="m">401107</span>:<span class="w"> </span><span class="m">48</span><span class="w"> </span><span class="m">89</span><span class="w"> </span>e5<span class="w"> </span>mov<span class="w"> </span>%rsp,%rbp <span class="w"> </span>40110a:<span class="w"> </span>b8<span class="w"> </span>fe<span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span><span class="m">00</span><span class="w"> </span>mov<span class="w"> </span><span class="nv">$0</span>xfe,%eax </pre></div> <p>Looking up <a href="http://ref.x86asm.net/coder64.html#x48">0x48</a> we see that this is a prefix instruction that turns on 64-bit mode for the instruction. Some instructions like <code>pop</code> and <code>push</code> don't need this prefix to be in 64-bit mode. In any case, this just means we'll have to have a size flag that switches from 32-bit to 64-bit mode on seeing this instruction. This flag will be reset each time we start reading an instruction.</p> <p>To deal with prefixes in general we'll loop through bytes when processing an instruction until we no longer see a prefix bytes. As we see prefix bytes we'll handle them accordingly.</p> <div class="highlight"><pre><span></span><span class="kd">var</span><span class="w"> </span><span class="nx">prefixBytes</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="p">{</span><span class="mh">0x48</span><span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">cpu</span><span class="p">)</span><span class="w"> </span><span class="nx">loop</span><span class="p">(</span><span class="nx">entryReturnAddress</span><span class="w"> </span><span class="kt">uint64</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="o">&lt;-</span><span class="nx">c</span><span class="p">.</span><span class="nx">tick</span> <span class="w"> </span><span class="nx">ip</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="nx">rip</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">ip</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">entryReturnAddress</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">break</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">inb1</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">mem</span><span class="p">[</span><span class="nx">ip</span><span class="p">]</span> <span class="w"> </span><span class="nx">widthPrefix</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">32</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">isPrefixByte</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">prefixByte</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">prefixBytes</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">prefixByte</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">inb1</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">isPrefixByte</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">true</span> <span class="w"> </span><span class="k">break</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">isPrefixByte</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">break</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// 64 bit prefix signifier</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">inb1</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mh">0x48</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">widthPrefix</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">64</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">hbdebug</span><span class="p">(</span><span class="s">&quot;prog&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">mem</span><span class="p">[</span><span class="nx">ip</span><span class="p">:</span><span class="nx">ip</span><span class="o">+</span><span class="mi">10</span><span class="p">])</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="s">&quot;Unknown prefix instruction&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">ip</span><span class="o">++</span> <span class="w"> </span><span class="nx">inb1</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">mem</span><span class="p">[</span><span class="nx">ip</span><span class="p">]</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">inb1</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="mh">0x50</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">inb1</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="mh">0x58</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="c1">// push</span> <span class="o">...</span> </pre></div> <p>Moving past this prefix we get to <a href="http://ref.x86asm.net/coder64.html#x89">0x89</a>. This instruction is for copying one register into another. The register operands are <a href="http://www.c-jump.com/CIS77/CPU/x86/X77_0270_modrm_byte.htm">encoded in the second byte</a>, <code>0xe5</code>, called the ModR/M byte. Pulling out the two registers is just a matter of shifting and bitmasking the right 3 bits for each.</p> <p>With this knowledge we can expand the instruction handling code.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">inb1</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="mh">0x50</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">inb1</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="mh">0x58</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="c1">// push</span> <span class="w"> </span><span class="nx">regvalue</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="nx">register</span><span class="p">(</span><span class="nx">inb1</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mh">0x50</span><span class="p">))</span> <span class="w"> </span><span class="nx">sp</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="nx">rsp</span><span class="p">)</span> <span class="w"> </span><span class="nx">writeBytes</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">mem</span><span class="p">,</span><span class="w"> </span><span class="nx">sp</span><span class="o">-</span><span class="mi">8</span><span class="p">,</span><span class="w"> </span><span class="mi">8</span><span class="p">,</span><span class="w"> </span><span class="nx">regvalue</span><span class="p">)</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">set</span><span class="p">(</span><span class="nx">rsp</span><span class="p">,</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nx">sp</span><span class="o">-</span><span class="mi">8</span><span class="p">))</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">inb1</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mh">0x89</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="c1">// mov r/m16/32/64, r/m16/32/64</span> <span class="w"> </span><span class="nx">ip</span><span class="o">++</span> <span class="w"> </span><span class="nx">inb2</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">mem</span><span class="p">[</span><span class="nx">ip</span><span class="p">]</span> <span class="w"> </span><span class="nx">rhs</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">register</span><span class="p">((</span><span class="nx">inb2</span><span class="w"> </span><span class="o">&amp;</span><span class="w"> </span><span class="mi">0</span><span class="nx">b00111000</span><span class="p">)</span><span class="w"> </span><span class="o">&gt;&gt;</span><span class="w"> </span><span class="mi">3</span><span class="p">)</span> <span class="w"> </span><span class="nx">lhs</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">register</span><span class="p">(</span><span class="nx">inb2</span><span class="w"> </span><span class="o">&amp;</span><span class="w"> </span><span class="mi">0</span><span class="nx">b111</span><span class="p">)</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">set</span><span class="p">(</span><span class="nx">lhs</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="nx">rhs</span><span class="p">))</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">hbdebug</span><span class="p">(</span><span class="s">&quot;prog&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">mem</span><span class="p">[</span><span class="nx">ip</span><span class="p">:</span><span class="nx">ip</span><span class="o">+</span><span class="mi">10</span><span class="p">])</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="s">&quot;Unknown instruction&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Try emulating <code>a.out</code> again now. It will panic on the next unknown instruction, <code>0xb8</code>. From <code>objdump</code> disassembly we see this is another <code>mov</code> instruction.</p> <p>Hurray! There are apparently multiple ways the same instruction can be encoded. Looking it up in the opcode table, we see <a href="http://ref.x86asm.net/coder64.html#xB8">0xB8</a> is for when the value to be copied is a literal number. The operand will be 32-bits, or four bytes, presumably because it doesn't have the <code>0x48</code> prefix.</p> <div class="highlight"><pre><span></span><span class="c1">// helper for converting up to 8 bytes into a single integer</span> <span class="kd">func</span><span class="w"> </span><span class="nx">readBytes</span><span class="p">(</span><span class="nx">from</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="p">,</span><span class="w"> </span><span class="nx">start</span><span class="w"> </span><span class="kt">uint64</span><span class="p">,</span><span class="w"> </span><span class="nx">bytes</span><span class="w"> </span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="kt">uint64</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">val</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nx">bytes</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">val</span><span class="w"> </span><span class="o">|=</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nx">from</span><span class="p">[</span><span class="nx">start</span><span class="o">+</span><span class="nb">uint64</span><span class="p">(</span><span class="nx">i</span><span class="p">)])</span><span class="w"> </span><span class="o">&lt;&lt;</span><span class="w"> </span><span class="p">(</span><span class="mi">8</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">val</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">cpu</span><span class="p">)</span><span class="w"> </span><span class="nx">loop</span><span class="p">(</span><span class="nx">entryReturnAddress</span><span class="w"> </span><span class="kt">uint64</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="o">...</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="o">...</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">inb1</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="mh">0xB8</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">inb1</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="mh">0xC0</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="c1">// mov r16/32/64, imm16/32/64</span> <span class="w"> </span><span class="nx">lreg</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">register</span><span class="p">(</span><span class="nx">inb1</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mh">0xB8</span><span class="p">)</span> <span class="w"> </span><span class="nx">val</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">readBytes</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">mem</span><span class="p">,</span><span class="w"> </span><span class="nx">ip</span><span class="o">+</span><span class="nb">uint64</span><span class="p">(</span><span class="mi">1</span><span class="p">),</span><span class="w"> </span><span class="nx">widthPrefix</span><span class="o">/</span><span class="mi">8</span><span class="p">)</span> <span class="w"> </span><span class="nx">ip</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nx">widthPrefix</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="mi">8</span><span class="p">)</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">set</span><span class="p">(</span><span class="nx">lreg</span><span class="p">,</span><span class="w"> </span><span class="nx">val</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="o">...</span> </pre></div> <p>Two more instructions to go: <code>pop</code> and <code>ret</code>.</p> <h3 id="a-terminal-debugger">A terminal debugger</h3><p>Taking a break for a moment, our system is already too complex to understand. It would be helpful to have a REPL so we can step through instructions and print register and memory values.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">cpu</span><span class="p">)</span><span class="w"> </span><span class="nx">resolveDebuggerValue</span><span class="p">(</span><span class="nx">dval</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="kt">uint64</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">reg</span><span class="p">,</span><span class="w"> </span><span class="nx">val</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">registerMap</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">val</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">dval</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="nx">reg</span><span class="p">),</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">dval</span><span class="p">)</span><span class="w"> </span><span class="p">&gt;</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="p">(</span><span class="nx">dval</span><span class="p">[:</span><span class="mi">2</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;0x&quot;</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">dval</span><span class="p">[:</span><span class="mi">2</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;0X&quot;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">strconv</span><span class="p">.</span><span class="nx">ParseUint</span><span class="p">(</span><span class="nx">dval</span><span class="p">[</span><span class="mi">2</span><span class="p">:],</span><span class="w"> </span><span class="mi">16</span><span class="p">,</span><span class="w"> </span><span class="mi">64</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">strconv</span><span class="p">.</span><span class="nx">ParseUint</span><span class="p">(</span><span class="nx">dval</span><span class="p">,</span><span class="w"> </span><span class="mi">10</span><span class="p">,</span><span class="w"> </span><span class="mi">64</span><span class="p">)</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">repl</span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">cpu</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="s">&quot;go-amd64-emulator REPL&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">help</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="s">`commands:</span> <span class="s"> s/step: continue to next instruction</span> <span class="s"> r/registers [$reg]: print all register values or just $reg</span> <span class="s"> d/decimal: toggle hex/decimal printing</span> <span class="s"> m/memory $from $count: print memory values starting at $from until $from+$count</span> <span class="s"> h/help: print this`</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="nx">help</span><span class="p">)</span> <span class="w"> </span><span class="nx">scanner</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">bufio</span><span class="p">.</span><span class="nx">NewScanner</span><span class="p">(</span><span class="nx">os</span><span class="p">.</span><span class="nx">Stdin</span><span class="p">)</span> <span class="w"> </span><span class="nx">intFormat</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="s">&quot;%d&quot;</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">&quot;&gt; &quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">scanner</span><span class="p">.</span><span class="nx">Scan</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">break</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">input</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">scanner</span><span class="p">.</span><span class="nx">Text</span><span class="p">()</span> <span class="w"> </span><span class="nx">parts</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">Split</span><span class="p">(</span><span class="nx">input</span><span class="p">,</span><span class="w"> </span><span class="s">&quot; &quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">parts</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s">&quot;h&quot;</span><span class="p">:</span> <span class="w"> </span><span class="k">fallthrough</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s">&quot;help&quot;</span><span class="p">:</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="nx">help</span><span class="p">)</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s">&quot;m&quot;</span><span class="p">:</span> <span class="w"> </span><span class="k">fallthrough</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s">&quot;memory&quot;</span><span class="p">:</span> <span class="w"> </span><span class="nx">msg</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="s">&quot;Invalid arguments: m/memory $from $to; use hex (0x10), decimal (10), or register name (rsp)&quot;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">parts</span><span class="p">)</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="mi">3</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="nx">msg</span><span class="p">)</span> <span class="w"> </span><span class="k">continue</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">from</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">resolveDebuggerValue</span><span class="p">(</span><span class="nx">parts</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="nx">msg</span><span class="p">)</span> <span class="w"> </span><span class="k">continue</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">to</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">resolveDebuggerValue</span><span class="p">(</span><span class="nx">parts</span><span class="p">[</span><span class="mi">2</span><span class="p">])</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="nx">msg</span><span class="p">)</span> <span class="w"> </span><span class="k">continue</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">hbdebug</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">&quot;memory[&quot;</span><span class="o">+</span><span class="nx">intFormat</span><span class="o">+</span><span class="s">&quot;:&quot;</span><span class="o">+</span><span class="nx">intFormat</span><span class="o">+</span><span class="s">&quot;]&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">from</span><span class="p">,</span><span class="w"> </span><span class="nx">from</span><span class="o">+</span><span class="nx">to</span><span class="p">),</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">mem</span><span class="p">[</span><span class="nx">from</span><span class="p">:</span><span class="nx">from</span><span class="o">+</span><span class="nx">to</span><span class="p">])</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s">&quot;d&quot;</span><span class="p">:</span> <span class="w"> </span><span class="k">fallthrough</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s">&quot;decimal&quot;</span><span class="p">:</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">intFormat</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;%d&quot;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">intFormat</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">&quot;0x%x&quot;</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="s">&quot;Numbers displayed as hex&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">intFormat</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">&quot;%d&quot;</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="s">&quot;Numbers displayed as decimal&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s">&quot;r&quot;</span><span class="p">:</span> <span class="w"> </span><span class="k">fallthrough</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s">&quot;registers&quot;</span><span class="p">:</span> <span class="w"> </span><span class="nx">filter</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="s">&quot;&quot;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">parts</span><span class="p">)</span><span class="w"> </span><span class="p">&gt;</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">filter</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">parts</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">registerMap</span><span class="p">);</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">reg</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">register</span><span class="p">(</span><span class="nx">i</span><span class="p">)</span> <span class="w"> </span><span class="nx">name</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">registerMap</span><span class="p">[</span><span class="nx">reg</span><span class="p">]</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">filter</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">filter</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">name</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">continue</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">&quot;%s:\t&quot;</span><span class="o">+</span><span class="nx">intFormat</span><span class="o">+</span><span class="s">&quot;\n&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">name</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="nx">reg</span><span class="p">))</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s">&quot;s&quot;</span><span class="p">:</span> <span class="w"> </span><span class="k">fallthrough</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s">&quot;step&quot;</span><span class="p">:</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">tick</span><span class="w"> </span><span class="o">&lt;-</span><span class="w"> </span><span class="kc">true</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>Let's try it out:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>build<span class="w"> </span>-o<span class="w"> </span>main $<span class="w"> </span>./main<span class="w"> </span>a.out<span class="w"> </span>--debug go-amd64-emulator<span class="w"> </span>REPL commands: <span class="w"> </span>s/step:<span class="w"> </span><span class="k">continue</span><span class="w"> </span>to<span class="w"> </span>next<span class="w"> </span>instruction <span class="w"> </span>r/registers<span class="w"> </span><span class="o">[</span><span class="nv">$reg</span><span class="o">]</span>:<span class="w"> </span>print<span class="w"> </span>all<span class="w"> </span>register<span class="w"> </span>values<span class="w"> </span>or<span class="w"> </span>just<span class="w"> </span><span class="nv">$reg</span> <span class="w"> </span>d/decimal:<span class="w"> </span>toggle<span class="w"> </span>hex/decimal<span class="w"> </span>printing <span class="w"> </span>m/memory<span class="w"> </span><span class="nv">$from</span><span class="w"> </span><span class="nv">$count</span>:<span class="w"> </span>print<span class="w"> </span>memory<span class="w"> </span>values<span class="w"> </span>starting<span class="w"> </span>at<span class="w"> </span><span class="nv">$from</span><span class="w"> </span><span class="k">until</span><span class="w"> </span><span class="nv">$from</span>+<span class="nv">$count</span> <span class="w"> </span>h/help:<span class="w"> </span>print<span class="w"> </span>this &gt;<span class="w"> </span>r rax:<span class="w"> </span><span class="m">0</span> rcx:<span class="w"> </span><span class="m">0</span> rdx:<span class="w"> </span><span class="m">0</span> rbx:<span class="w"> </span><span class="m">0</span> rsp:<span class="w"> </span><span class="m">41943040</span> rbp:<span class="w"> </span><span class="m">0</span> rsi:<span class="w"> </span><span class="m">0</span> rdi:<span class="w"> </span><span class="m">0</span> r8:<span class="w"> </span><span class="m">0</span> r9:<span class="w"> </span><span class="m">0</span> r10:<span class="w"> </span><span class="m">0</span> r11:<span class="w"> </span><span class="m">0</span> r12:<span class="w"> </span><span class="m">0</span> r13:<span class="w"> </span><span class="m">0</span> r14:<span class="w"> </span><span class="m">0</span> r15:<span class="w"> </span><span class="m">0</span> rip:<span class="w"> </span><span class="m">4198662</span> rflags:<span class="w"> </span><span class="m">0</span> &gt;<span class="w"> </span>m<span class="w"> </span>rip<span class="w"> </span><span class="m">10</span> memory<span class="o">[</span><span class="m">4198662</span>:4198672<span class="o">]</span>:<span class="w"> </span><span class="m">55</span><span class="w"> </span><span class="m">48</span><span class="w"> </span><span class="m">89</span><span class="w"> </span>e5<span class="w"> </span>b8<span class="w"> </span><span class="m">4</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span><span class="w"> </span>5d &gt;<span class="w"> </span>s &gt;<span class="w"> </span>m<span class="w"> </span>rip<span class="w"> </span><span class="m">10</span> memory<span class="o">[</span><span class="m">4198663</span>:4198673<span class="o">]</span>:<span class="w"> </span><span class="m">48</span><span class="w"> </span><span class="m">89</span><span class="w"> </span>e5<span class="w"> </span>b8<span class="w"> </span><span class="m">4</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span><span class="w"> </span>5d<span class="w"> </span>c3 &gt;<span class="w"> </span>r rax:<span class="w"> </span><span class="m">0</span> rcx:<span class="w"> </span><span class="m">0</span> rdx:<span class="w"> </span><span class="m">0</span> rbx:<span class="w"> </span><span class="m">0</span> rsp:<span class="w"> </span><span class="m">41943032</span> rbp:<span class="w"> </span><span class="m">0</span> rsi:<span class="w"> </span><span class="m">0</span> rdi:<span class="w"> </span><span class="m">0</span> r8:<span class="w"> </span><span class="m">0</span> r9:<span class="w"> </span><span class="m">0</span> r10:<span class="w"> </span><span class="m">0</span> r11:<span class="w"> </span><span class="m">0</span> r12:<span class="w"> </span><span class="m">0</span> r13:<span class="w"> </span><span class="m">0</span> r14:<span class="w"> </span><span class="m">0</span> r15:<span class="w"> </span><span class="m">0</span> rip:<span class="w"> </span><span class="m">4198663</span> rflags:<span class="w"> </span><span class="m">0</span> &gt;<span class="w"> </span>^D </pre></div> <p>Now we can inspect the system interactively.</p> <h3 id="pop">pop</h3><p>Reemersing in the state of things, we now panic on <code>0x5D</code>.</p> <div class="highlight"><pre><span></span>./main<span class="w"> </span>a.out prog:<span class="w"> </span>5d<span class="w"> </span>c3<span class="w"> </span><span class="m">66</span><span class="w"> </span>2e<span class="w"> </span>f<span class="w"> </span>1f<span class="w"> </span><span class="m">84</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span> panic:<span class="w"> </span>Unknown<span class="w"> </span>instruction goroutine<span class="w"> </span><span class="m">5</span><span class="w"> </span><span class="o">[</span>running<span class="o">]</span>: main.<span class="o">(</span>*cpu<span class="o">)</span>.loop<span class="o">(</span>0xc000098ae0,<span class="w"> </span>0x2800000<span class="o">)</span> <span class="w"> </span>/home/phil/tmp/goamd/main.go:219<span class="w"> </span>+0x2c5 main.<span class="o">(</span>*cpu<span class="o">)</span>.run<span class="o">(</span>0xc000098ae0,<span class="w"> </span>0xc000098ab0<span class="o">)</span> <span class="w"> </span>/home/phil/tmp/goamd/main.go:231<span class="w"> </span>+0xac created<span class="w"> </span>by<span class="w"> </span>main.main <span class="w"> </span>/home/phil/tmp/goamd/main.go:358<span class="w"> </span>+0x286 </pre></div> <p>Looking <a href="http://ref.x86asm.net/coder64.html#x5D">this up</a> we see this is part of <code>58+r</code>, <code>pop</code>. Similar to <code>push</code> we subtract <code>0x58</code> from the byte to get the register to pop onto. The stack operation is the reverse of <code>push</code>.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">cpu</span><span class="p">)</span><span class="w"> </span><span class="nx">loop</span><span class="p">(</span><span class="nx">entryReturnAddress</span><span class="w"> </span><span class="kt">uint64</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="o">...</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="o">...</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">inb1</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="mh">0x58</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">inb1</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="mh">0x60</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="c1">// pop</span> <span class="w"> </span><span class="nx">lhs</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">register</span><span class="p">(</span><span class="nx">inb1</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mh">0x58</span><span class="p">)</span> <span class="w"> </span><span class="nx">sp</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="nx">rsp</span><span class="p">)</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">set</span><span class="p">(</span><span class="nx">lhs</span><span class="p">,</span><span class="w"> </span><span class="nx">readBytes</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">mem</span><span class="p">,</span><span class="w"> </span><span class="nx">sp</span><span class="p">,</span><span class="w"> </span><span class="mi">8</span><span class="p">))</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">set</span><span class="p">(</span><span class="nx">rsp</span><span class="p">,</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nx">sp</span><span class="o">+</span><span class="mi">8</span><span class="p">))</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="o">...</span> </pre></div> <p>Build and run for the final panic:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>build<span class="w"> </span>-o<span class="w"> </span>main $<span class="w"> </span>./main<span class="w"> </span>a.out prog:<span class="w"> </span>c3<span class="w"> </span><span class="m">66</span><span class="w"> </span>2e<span class="w"> </span>f<span class="w"> </span>1f<span class="w"> </span><span class="m">84</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span><span class="w"> </span><span class="m">0</span> panic:<span class="w"> </span>Unknown<span class="w"> </span>instruction goroutine<span class="w"> </span><span class="m">8</span><span class="w"> </span><span class="o">[</span>running<span class="o">]</span>: main.<span class="o">(</span>*cpu<span class="o">)</span>.loop<span class="o">(</span>0xc000060c30,<span class="w"> </span>0x2800000<span class="o">)</span> <span class="w"> </span>/home/phil/tmp/goamd/main.go:224<span class="w"> </span>+0x345 main.<span class="o">(</span>*cpu<span class="o">)</span>.run<span class="o">(</span>0xc000060c30,<span class="w"> </span>0xc000060c00<span class="o">)</span> <span class="w"> </span>/home/phil/tmp/goamd/main.go:236<span class="w"> </span>+0xac created<span class="w"> </span>by<span class="w"> </span>main.main <span class="w"> </span>/home/phil/tmp/goamd/main.go:363<span class="w"> </span>+0x286 </pre></div> <h3 id="ret">ret</h3><p>Looking up <a href="http://ref.x86asm.net/coder64.html#xC3">0xC3</a> we see that it is indeed <code>ret</code>. This function's responsibilty is to pop the stack onto rip, jumping back to caller.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">inb1</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mh">0xC3</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="c1">// ret</span> <span class="w"> </span><span class="nx">sp</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="nx">rsp</span><span class="p">)</span> <span class="w"> </span><span class="nx">retAddress</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">readBytes</span><span class="p">(</span><span class="nx">c</span><span class="p">.</span><span class="nx">mem</span><span class="p">,</span><span class="w"> </span><span class="nx">sp</span><span class="p">,</span><span class="w"> </span><span class="mi">8</span><span class="p">)</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">set</span><span class="p">(</span><span class="nx">rsp</span><span class="p">,</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nx">sp</span><span class="o">+</span><span class="mi">8</span><span class="p">))</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">regfile</span><span class="p">.</span><span class="nx">set</span><span class="p">(</span><span class="nx">rip</span><span class="p">,</span><span class="w"> </span><span class="nx">retAddress</span><span class="p">)</span> <span class="w"> </span><span class="k">continue</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Build and run:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>build<span class="w"> </span>-o<span class="w"> </span>main $<span class="w"> </span>./main<span class="w"> </span>a.out $<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="nv">$?</span> <span class="m">4</span> </pre></div> <p>If we modify <code>tests/simple.c</code>?</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>tests/simple.c int<span class="w"> </span>main<span class="o">()</span><span class="w"> </span><span class="o">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="m">254</span><span class="p">;</span> <span class="o">}</span> $<span class="w"> </span>gcc<span class="w"> </span>tests/simple.c $<span class="w"> </span>./main<span class="w"> </span>a.out<span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="nv">$?</span> <span class="m">254</span> </pre></div> <p>Not bad!</p> <h3 id="process-and-next-steps">Process and next steps</h3><p>Getting this far took a lot of trial and error, much of it hidden in this post. Setting up the REPL was critical to debugging mistakes. But aggressively unit testing would probably have been similarly fruitful. In the end, the most bug-prone aspects are basic arithmetic (off by one errors and converting bytes to/from integers). The part that's not terribly hard is actually interpreting instructions! But it's made easier by greatly simplifying the problem and ignoring legion cases.</p> <p>Along the way it would have been helpful to also disassemble so that instead of just dumping memory at the instruction pointer we print the instructions we thought we were going to process. That may be a next goal.</p> <p>Otherwise the typical goals are around getting syscall support, function call support, and porting these simple examples to Windows and macOS for the experience.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Here&#39;s take two on writing an emulator for linux/amd64 in Go. This time we&#39;re starting with ELF binaries, but still ignoring libc and jumping straight to main.<a href="https://t.co/A87r2RY21c">https://t.co/A87r2RY21c</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1332111601814691840?ref_src=twsrc%5Etfw">November 26, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/emulating-amd64-starting-with-elf.htmlThu, 26 Nov 2020 00:00:00 +0000The impact of management teams as a decision-making group, in startups and enterprisehttp://notes.eatonphil.com/the-impact-of-management-teams-on-startups-and-enterprises.html<p>Ambitious companies form management teams at every level above you, sometimes including you. Management teams meet periodically and have private chat rooms. They discuss customers, product and organizational direction. Sometimes discussions are well documented and periodically public. Sometimes decisions are poorly telegraphed out.</p> <p>Management teams do no inherent harm in a company with customers; employees outside of the management team can unearth customer usage data to discover meaningful places to contribute. For example, graphing historic server logs to discover slowest requests, figure out why and how to fix. Or even just paying attention to the most frequent questions sales asks and finding ways to clarify. (All of this under the assumption that even when there is solid product direction, good employees tend to have extra time at work and want to make good use of it.)</p> <p>For the first few years even in a well-funded startup with solid founders, there are few customers. Even under a solid product team, the product direction is not yet completely clear. The management team includes founders and non-engineering executives. As a decision making group they are opaque. Employees outside the management team face a barrier in finding ways to meaningful contribute. Ambitious, dedicated folks outside the team leave.</p> <h3 id="so-what?">So what?</h3><p>It is not clear to me how the natural (and not inherently bad) concept of management teams attracts and retains ambitious, dedicated non-founders at small companies. Maybe disenfranchisement is not important, or even necessary.</p> <p>Or maybe management teams as a decision-making group are too easily a substitute for developing a grassroots culture of collaboration and trust between marketing, sales, product and development.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">New post! &quot;management teams as a decision-making group are too easily a substitute for developing a grassroots culture of collaboration and trust between marketing, sales, product and development.&quot;<a href="https://t.co/7RukBMI59h">https://t.co/7RukBMI59h</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1328482381314084864?ref_src=twsrc%5Etfw">November 16, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/the-impact-of-management-teams-on-startups-and-enterprises.htmlWed, 11 Nov 2020 00:00:00 +0000Standard ML in 2020http://notes.eatonphil.com/standard-ml-in-2020.html<p>Incredibly, Standard ML implementations are still actively developed. <a href="http://mlton.org/">MLton</a>, <a href="https://polyml.org">Poly/ML</a>, <a href="https://elsman.com/mlkit/">MLKit</a>, <a href="https://www.pllab.riec.tohoku.ac.jp/smlsharp/">SML#</a> and <a href="http://smlnj-gforge.cs.uchicago.edu/scm/viewvc.php/?root=smlnj">SML/NJ</a> are the most prominent. Discussion on the future direction of Standard ML <a href="https://github.com/SMLFamily/Successor-ML/issues">remains healthy as well</a>.</p> <p>And somehow OCaml's lesser known cousin still beats out OCaml for multicore threading support (in Poly/ML).</p> <p>While MLton hasn't merged with <a href="https://github.com/kayceesrk/multiMLton">MultiMLton</a> or <a href="https://github.com/UBMLtonGroup/RTMLton">RTMLton</a> to support multicore, a <a href="https://github.com/mpllang/mpl">new fork of MLton with parallelism</a> is pretty far along and in active development at CMU.</p> <p class="note"> A commentor shared <a href="https://github.com/ManticoreProject/manticore">Manticore</a>, another implementation with parallelism support in active development at UChicago. </p><p>Furthermore, the last few years have welcomed some entirely new implementations. <a href="https://github.com/KeenS/webml">WebML</a>, by a prominent open source hacker, is written in Rust and compiles Standard ML to WebAssembly. <a href="https://sosml.org/">SOSML</a> is an interpreter written in TypeScript by former students of Saarland University. It features <a href="https://sosml.org/editor">a nifty online IDE</a>.</p> <p class="note"> A commenter shared <a href="https://github.com/SomewhatML/sml-compiler">SomewhatML</a>, an actively developing compiler for Standard ML written in Rust. </p><p>There have also been some new experimental spins on Standard ML in the last few years. <a href="https://github.com/julianhyde/morel">Morel</a> is an interpreter with some nice syntax extensions written in Java by the author of Apache Calcite. And <a href="https://github.com/elpinal/bright-ml">Bright ML</a> is a spin on Standard ML and OCaml written in Standard ML (and using the abandoned <a href="https://mosml.org/">Moscow ML</a> compiler of all implementations).</p> <p>So if you're looking for an easy intro to the ML family of languages, I still recommend the simplicity and performance of Standard ML and its small but definitely, surprisingly, not dead community. :)</p> <p>Additional resources:</p> <ul> <li><a href="https://smlfamily.github.io/">SML Family Site</a></li> <li><a href="https://smlfamily.github.io/Basis/index.html">SML Standard Library (Basis Library) Documentation</a></li> <li><a href="https://reddit.com/r/sml">/r/sml</a></li> </ul> <p>Are you using Standard ML? <a href="mailto:[email protected]">Let me know how/why!</a></p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Standard ML implementations are still in active development! There have even been some interesting new implementations pop up in the last few years.<a href="https://t.co/6kOcMKVfQa">https://t.co/6kOcMKVfQa</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1320487302418845696?ref_src=twsrc%5Etfw">October 25, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/standard-ml-in-2020.htmlSun, 25 Oct 2020 00:00:00 +0000The case for comments in codehttp://notes.eatonphil.com/the-case-for-comments-in-code.html<p>When I first started programming, especially when asked for code samples, my comments lacked purpose and would often duplicate in English what the code clearly indicated. I knew that "commenting is good" but as a beginner I had no further insight.</p> <p>Over time with the help of books like Clean Code, I grew disdainful of comments. Good code should be self-documenting. Whenever I needed to write a comment to explain something, I'd realize I could easily rename some key variable or function. I grew more comfortable with variables and functions with a few words in the title. Better to spend time on good code structure and naming.</p> <p class="note"> I have always left TODOs though, since TODOs can't so easily be expressed in variable names. But even these TODOs concerned me because they existed in my issue tracker, or maybe should have. </p><p>As I watched mature open source projects and mature engineers, I came to value well-documented pull requests. Solid pull requests include or link to all necessary background, opportunities failed or ignored, how to use, links to external bugs requiring workarounds and the results of performance evaluation.</p> <p>Beyond pull request descriptions, when I really wanted to grease a pull request I'd use the pull request UI to add comments calling reviewer attention to key changes in lines of the diff.</p> <p>Both kinds of guidance are a massive aid to reviewers, saving a lot of time.</p> <p>But when I'd find a bug in code -- and I knew there was good pull request documentation, even for pull requests as recent as six months ago -- I've been repeatedly failed by the pull request and <em>pull request comment</em> search exposed by Github and Gitlab.</p> <p>I <em>knew</em> there were links to documented oddities or bug reports in pull request threads. But practically speaking, for historic pull requests, pull request comments are useless.</p> <p>This is the single biggest reason I've started to push for more comments in code. More so than all other tools (issue tracker, code management system, etc.) comments in code have the greatest chance of still being around and <em>easily searchable</em> if they haven't been deleted.</p> <p class="note"> Don't get me started on pull request documentation in an external medium like Slack. It's so rewarding to get or give instant feedback on changes on instant messengers, but good luck finding that discussion 3 months later. </p><p>Every time I have to call out a line of code in a pull request, that's immediate cause for that code to be modified with comments.</p> <p>Maybe I wouldn't do this if Github/Gitlab exposed a Google Docs-like interface for browsing code line by line with links to all pull request comment threads.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">The biggest reason to add comments in code (often linking to documented oddities or bug reports) is because it&#39;s impossible to search pull request threads historically in every source control management UI I&#39;ve used.<a href="https://t.co/JlHWfbUH5z">https://t.co/JlHWfbUH5z</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1303130504993136642?ref_src=twsrc%5Etfw">September 8, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/the-case-for-comments-in-code.htmlMon, 07 Sep 2020 00:00:00 +0000Writing a simple Python compiler: 1. hello, fibonaccihttp://notes.eatonphil.com/writing-a-simple-python-compiler.html<p>In this post we'll write a Python to C compiler in Python. This is especially easy to do since Python has a <a href="https://docs.python.org/3/library/ast.html">builtin parser library</a> and because a number of <a href="https://docs.python.org/3/c-api/">CPython internals are exposed for extension writers</a>.</p> <p>By the end of this post, in a few hundred lines of Python, we'll be able to compile and run the following program:</p> <div class="highlight"><pre><span></span><span class="err">$</span> <span class="n">cat</span> <span class="n">tests</span><span class="o">/</span><span class="n">recursive_fib</span><span class="o">.</span><span class="n">py</span> <span class="k">def</span> <span class="nf">fib</span><span class="p">(</span><span class="n">n</span><span class="p">):</span> <span class="k">if</span> <span class="n">n</span> <span class="o">==</span> <span class="mi">0</span> <span class="ow">or</span> <span class="n">n</span> <span class="o">==</span> <span class="mi">1</span><span class="p">:</span> <span class="k">return</span> <span class="n">n</span> <span class="k">return</span> <span class="n">fib</span><span class="p">(</span><span class="n">n</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)</span> <span class="o">+</span> <span class="n">fib</span><span class="p">(</span><span class="n">n</span> <span class="o">-</span> <span class="mi">2</span><span class="p">)</span> <span class="k">def</span> <span class="nf">main</span><span class="p">():</span> <span class="nb">print</span><span class="p">(</span><span class="n">fib</span><span class="p">(</span><span class="mi">40</span><span class="p">))</span> <span class="err">$</span> <span class="n">python3</span> <span class="n">pyc</span> <span class="n">tests</span><span class="o">/</span><span class="n">recursive_fib</span><span class="o">.</span><span class="n">py</span> <span class="err">$</span> <span class="o">./</span><span class="nb">bin</span><span class="o">/</span><span class="n">a</span><span class="o">.</span><span class="n">out</span> <span class="mi">102334155</span> </pre></div> <p>This post implements an extremely small subset of Python and <strong>completely gives up on even trying to manage memory</strong> because I cannot fathom manual reference counting. Maybe some day I'll find a way to swap in an easy GC like Boehm.</p> <p><a href="https://github.com/eatonphil/pyc">Source code for this project is available on Github.</a></p> <h3 id="dependencies">Dependencies</h3><p>We'll need Python3, GCC, libpython3, and clang-format.</p> <p>On Fedora-based systems:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>sudo<span class="w"> </span>dnf<span class="w"> </span>install<span class="w"> </span>gcc<span class="w"> </span>python3-devel<span class="w"> </span>clang-format<span class="w"> </span>python3 </pre></div> <p>And on Debian-based systems:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>sudo<span class="w"> </span>apt<span class="w"> </span>install<span class="w"> </span>gcc<span class="w"> </span>python3-dev<span class="w"> </span>clang-format<span class="w"> </span>python3 </pre></div> <p class="note"> This program will likely work as well on Windows, Mac, FreeBSD, etc. but I haven't gone through the trouble of testing this (or providing custom compiler directives). Pull requests welcome! </p><h3 id="a-hand-written-first-pass">A hand-written first-pass</h3><p>Before we get into the compiler, let's write the fibonacci program by hand in C using libpython.</p> <p>As described in the <a href="https://docs.python.org/3/extending/embedding.html#very-high-level-embedding">Python embedding guide</a> we'll need to include libpython and initialize it in our <code>main.c</code>:</p> <div class="highlight"><pre><span></span><span class="cp">#define PY_SSIZE_T_CLEAN</span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;Python.h&gt;</span> <span class="kt">int</span><span class="w"> </span><span class="nf">main</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">argc</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">argv</span><span class="p">[])</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">Py_Initialize</span><span class="p">();</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="p">}</span> </pre></div> <p>To compile against libpython, we'll use <a href="https://helpmanual.io/man1/python3-config/">python3-config</a> installed as part of <code>python3-devel</code> to tell us what should be linked at each step during compilation.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>gcc<span class="w"> </span>-c<span class="w"> </span>-o<span class="w"> </span>main.o<span class="w"> </span><span class="k">$(</span>python3-config<span class="w"> </span>--cflags<span class="k">)</span><span class="w"> </span>main.c $<span class="w"> </span>gcc<span class="w"> </span><span class="k">$(</span>python3-config<span class="w"> </span>--ldflags<span class="k">)</span><span class="w"> </span>main.o $<span class="w"> </span>./a.out<span class="p">;</span><span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="nv">$?</span> <span class="m">0</span> </pre></div> <p>Cool! Now as we think about translating the fibonacci implementation, we want to keep everything as Python objects for as long as possible. This means passing and receiving <a href="https://docs.python.org/3/c-api/object.html">PyObject*</a> to and from all functions, and converting all C integers to <a href="https://docs.python.org/3/c-api/long.html">PyLong*</a>, a "subtype" of <code>PyObject*</code>. You can imagine that everything in Python is an <code>object</code> until you operate on it.</p> <p class="note"> For more information on objects in Python, check out the <a href="https://docs.python.org/3/reference/datamodel.html">Data model</a> page in Python docs. </p><p>To map a C integer to a <code>PyLong*</code> we use <a href="https://docs.python.org/3/c-api/long.html#c.PyLong_FromLong">PyLong_FromLong</a>. To map in reverse, we use <a href="https://docs.python.org/3/c-api/long.html#c.PyLong_AsLong">PyLong_AsLong</a>.</p> <p>To compare two <code>PyObject*</code>s we can use <a href="https://docs.python.org/3/c-api/object.html#c.PyObject_RichCompareBool">PyObject_RichCompareBool</a> which will handle the comparison regardless of the type of the two parameters. Without this helper we'd have to write complex checks to make sure that the two sides are the same and if they are, unwrap them into their underlying C value and compare the C value.</p> <p>We can use <a href="https://docs.python.org/3/c-api/number.html#c.PyNumber_Add">PyNumber_Add</a> and <a href="https://docs.python.org/3/c-api/number.html#c.PyNumber_Subtract">PyNumber_Subtract</a> for basic arithmetic, and there are many similar helpers available to us for operations down the line.</p> <p>Now we can write a translation:</p> <div class="highlight"><pre><span></span><span class="cp">#define PY_SSIZE_T_CLEAN</span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;Python.h&gt;</span> <span class="n">PyObject</span><span class="o">*</span><span class="w"> </span><span class="nf">fib</span><span class="p">(</span><span class="n">PyObject</span><span class="o">*</span><span class="w"> </span><span class="n">n</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">PyObject</span><span class="o">*</span><span class="w"> </span><span class="n">zero</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">PyLong_FromLong</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span> <span class="w"> </span><span class="n">PyObject</span><span class="o">*</span><span class="w"> </span><span class="n">one</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">PyLong_FromLong</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">PyObject_RichCompareBool</span><span class="p">(</span><span class="n">n</span><span class="p">,</span><span class="w"> </span><span class="n">zero</span><span class="p">,</span><span class="w"> </span><span class="n">Py_EQ</span><span class="p">)</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">PyObject_RichCompareBool</span><span class="p">(</span><span class="n">n</span><span class="p">,</span><span class="w"> </span><span class="n">one</span><span class="p">,</span><span class="w"> </span><span class="n">Py_EQ</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">n</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">PyObject</span><span class="o">*</span><span class="w"> </span><span class="n">left</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fib</span><span class="p">(</span><span class="n">PyNumber_Subtract</span><span class="p">(</span><span class="n">n</span><span class="p">,</span><span class="w"> </span><span class="n">one</span><span class="p">));</span> <span class="w"> </span><span class="n">PyObject</span><span class="o">*</span><span class="w"> </span><span class="n">two</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">PyLong_FromLong</span><span class="p">(</span><span class="mi">2</span><span class="p">);</span> <span class="w"> </span><span class="n">PyObject</span><span class="o">*</span><span class="w"> </span><span class="n">right</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fib</span><span class="p">(</span><span class="n">PyNumber_Subtract</span><span class="p">(</span><span class="n">n</span><span class="p">,</span><span class="w"> </span><span class="n">two</span><span class="p">));</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">PyNumber_Add</span><span class="p">(</span><span class="n">left</span><span class="p">,</span><span class="w"> </span><span class="n">right</span><span class="p">);</span> <span class="p">}</span> <span class="kt">int</span><span class="w"> </span><span class="nf">main</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">argc</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">*</span><span class="n">argv</span><span class="p">[])</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">Py_Initialize</span><span class="p">();</span> <span class="w"> </span><span class="n">PyObject</span><span class="o">*</span><span class="w"> </span><span class="n">res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fib</span><span class="p">(</span><span class="n">PyLong_FromLong</span><span class="p">(</span><span class="mi">7</span><span class="p">));</span><span class="w"> </span><span class="c1">// Should be 13</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">PyLong_AsLong</span><span class="p">(</span><span class="n">res</span><span class="p">);</span> <span class="p">}</span> </pre></div> <p>Compile and run it:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>gcc<span class="w"> </span>-c<span class="w"> </span>-o<span class="w"> </span>main.o<span class="w"> </span><span class="k">$(</span>python3-config<span class="w"> </span>--cflags<span class="k">)</span><span class="w"> </span>main.c $<span class="w"> </span>gcc<span class="w"> </span><span class="k">$(</span>python3-config<span class="w"> </span>--ldflags<span class="k">)</span><span class="w"> </span>main.o $<span class="w"> </span>./a.out<span class="p">;</span><span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="nv">$?</span> <span class="m">13</span> </pre></div> <p>That's great! But we cheated in one place. We assumed that the input to the <code>fib</code> function was an integer, and we propagated that assumption everywhere we wrote <code>PyNumber_*</code> operations. When we write the compiler, we'll need to check that both arguments are an integer before we call a numeric helper, otherwise we may need to call a string concatenation helper or something else entirely.</p> <h3 id="compiler-architecture">Compiler Architecture</h3><p>We'll break the code into four major parts:</p> <ol> <li><code>libpyc.c</code>: helper functions for generated code</li> <li><code>pyc/context.py</code>: utilities for scope and writing code in memory</li> <li><code>pyc/codegen.py</code>: for generating C code from a Python AST</li> <li><code>pyc/__main__.py</code>: the entrypoint</li> </ol> <p class="note"> When I'm writing a new compiler using an existing parser I almost always start with the entrypoint and code generator so I can explore the AST. However, it's easiest to explain the code if we start with the utilities first. </p><p>We'll also want an empty <code>pyc/__init__.py</code>.</p> <h3 id="libpyc.c">libpyc.c</h3><p>This C file will contain three helper functions for safely adding, subtracting, and printing. It will be concatenated to the top of the generated C file. We'll only support integers for now but this structure sets us up for supporting more types later on.</p> <p>We'll use <a href="https://docs.python.org/3/c-api/long.html#c.PyLong_Check">PyLong_Check</a> before calling number-specific methods.</p> <div class="highlight"><pre><span></span><span class="cp">#define PY_SSIZE_T_CLEAN</span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;Python.h&gt;</span> <span class="kr">inline</span><span class="w"> </span><span class="n">PyObject</span><span class="o">*</span><span class="w"> </span><span class="nf">PYC_Add</span><span class="p">(</span><span class="n">PyObject</span><span class="o">*</span><span class="w"> </span><span class="n">l</span><span class="p">,</span><span class="w"> </span><span class="n">PyObject</span><span class="o">*</span><span class="w"> </span><span class="n">r</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// TODO: allow __add__ override</span> <span class="w"> </span><span class="c1">// Includes ints and bools</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">PyLong_Check</span><span class="p">(</span><span class="n">l</span><span class="p">)</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="n">PyLong_Check</span><span class="p">(</span><span class="n">r</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">PyNumber_Add</span><span class="p">(</span><span class="n">l</span><span class="p">,</span><span class="w"> </span><span class="n">r</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// TODO: handle str, etc.</span> <span class="w"> </span><span class="c1">// TODO: throw exception</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span> <span class="p">}</span> <span class="kr">inline</span><span class="w"> </span><span class="n">PyObject</span><span class="o">*</span><span class="w"> </span><span class="nf">PYC_Sub</span><span class="p">(</span><span class="n">PyObject</span><span class="o">*</span><span class="w"> </span><span class="n">l</span><span class="p">,</span><span class="w"> </span><span class="n">PyObject</span><span class="o">*</span><span class="w"> </span><span class="n">r</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// TODO: allow __add__ override</span> <span class="w"> </span><span class="c1">// Includes ints and bools</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">PyLong_Check</span><span class="p">(</span><span class="n">l</span><span class="p">)</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="n">PyLong_Check</span><span class="p">(</span><span class="n">r</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">PyNumber_Subtract</span><span class="p">(</span><span class="n">l</span><span class="p">,</span><span class="w"> </span><span class="n">r</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// TODO: handle str, etc.</span> <span class="w"> </span><span class="c1">// TODO: throw exception</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">NULL</span><span class="p">;</span> <span class="p">}</span> <span class="kr">inline</span><span class="w"> </span><span class="n">PyObject</span><span class="o">*</span><span class="w"> </span><span class="nf">PYC_Print</span><span class="p">(</span><span class="n">PyObject</span><span class="o">*</span><span class="w"> </span><span class="n">o</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">PyObject_Print</span><span class="p">(</span><span class="n">o</span><span class="p">,</span><span class="w"> </span><span class="n">stdout</span><span class="p">,</span><span class="w"> </span><span class="n">Py_PRINT_RAW</span><span class="p">);</span> <span class="w"> </span><span class="n">printf</span><span class="p">(</span><span class="s">&quot;</span><span class="se">\n</span><span class="s">&quot;</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">Py_None</span><span class="p">;</span> <span class="p">}</span> </pre></div> <p>That's it! We could generate these as strings in Python but it gets hairy to do so. By using a dedicated C file, we can take advantage of syntax highlighting since this file is only C code. And since we've marked all functions as <code>inline</code>, there's no runtime cost to using not embedding these as strings in Python.</p> <h3 id="pyc/context.py">pyc/context.py</h3><p>This file will contain a <code>Context</code> class for managing identifiers in scope and for proxying to a <code>Writer</code> class that contains helpers for writing lines of C code.</p> <p>We'll have two instances of the <code>Writer</code> class in <code>Context</code> so that we can write to a body (or current/primary) region and an initialization region.</p> <p>The initialization region is necessary in case there are any variables declared at the top-level. We can't initialize these variables in C outside of a function since every <code>PyObject*</code> must be created after calling <code>Py_Initialize</code>. This section will be written into our C <code>main</code> function before we enter a compiled Python <code>main</code> function.</p> <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">copy</span> <span class="k">class</span> <span class="nc">Writer</span><span class="p">():</span> <span class="n">content</span> <span class="o">=</span> <span class="s2">&quot;&quot;</span> <span class="k">def</span> <span class="nf">write</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">exp</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">indent</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">0</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">content</span> <span class="o">+=</span> <span class="p">(</span><span class="s2">&quot; &quot;</span> <span class="o">*</span> <span class="n">indent</span><span class="p">)</span> <span class="o">+</span> <span class="n">exp</span> <span class="k">def</span> <span class="nf">writeln</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">stmt</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">indent</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">0</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">stmt</span> <span class="o">+</span> <span class="s2">&quot;</span><span class="se">\n</span><span class="s2">&quot;</span><span class="p">,</span> <span class="n">indent</span><span class="p">)</span> <span class="k">def</span> <span class="nf">write_statement</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">stmt</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">indent</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">0</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">writeln</span><span class="p">(</span><span class="n">stmt</span> <span class="o">+</span> <span class="s2">&quot;;&quot;</span><span class="p">,</span> <span class="n">indent</span><span class="p">)</span> <span class="k">class</span> <span class="nc">Context</span><span class="p">():</span> <span class="n">initializations</span> <span class="o">=</span> <span class="n">Writer</span><span class="p">()</span> <span class="n">body</span> <span class="o">=</span> <span class="n">Writer</span><span class="p">()</span> <span class="n">indentation</span> <span class="o">=</span> <span class="mi">0</span> <span class="n">scope</span> <span class="o">=</span> <span class="mi">0</span> <span class="n">ret</span> <span class="o">=</span> <span class="kc">None</span> <span class="n">namings</span> <span class="o">=</span> <span class="p">{}</span> <span class="n">counter</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span> <span class="k">def</span> <span class="fm">__getattr__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">object</span><span class="p">:</span> <span class="c1"># Helpers to avoid passing in self.indentation every time</span> <span class="n">outputs</span> <span class="o">=</span> <span class="p">[</span><span class="n">initializations</span><span class="s2">&quot;, &quot;</span><span class="n">body</span><span class="s2">&quot;]</span> <span class="k">for</span> <span class="n">output</span> <span class="ow">in</span> <span class="n">outputs</span><span class="p">:</span> <span class="k">if</span> <span class="n">name</span><span class="o">.</span><span class="n">startswith</span><span class="p">(</span><span class="n">output</span><span class="p">):</span> <span class="k">return</span> <span class="k">lambda</span> <span class="n">s</span><span class="p">,</span> <span class="n">i</span><span class="o">=</span><span class="kc">None</span><span class="p">:</span> <span class="nb">getattr</span><span class="p">(</span><span class="nb">getattr</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">output</span><span class="p">),</span> <span class="n">name</span><span class="p">[</span><span class="nb">len</span><span class="p">(</span><span class="n">output</span><span class="p">)</span><span class="o">+</span><span class="mi">1</span><span class="p">:])(</span><span class="n">s</span><span class="p">,</span> <span class="n">i</span> <span class="k">if</span> <span class="n">i</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span> <span class="k">else</span> <span class="bp">self</span><span class="o">.</span><span class="n">indentation</span><span class="p">)</span> <span class="k">return</span> <span class="nb">object</span><span class="o">.</span><span class="fm">__getattr__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">)</span> <span class="k">def</span> <span class="nf">get_local</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">source_name</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">dict</span><span class="p">:</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">namings</span><span class="p">[</span><span class="n">source_name</span><span class="p">]</span> <span class="k">def</span> <span class="nf">register_global</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">name</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">loc</span><span class="p">:</span> <span class="nb">str</span><span class="p">):</span> <span class="bp">self</span><span class="o">.</span><span class="n">namings</span><span class="p">[</span><span class="n">name</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="s2">&quot;name&quot;</span><span class="p">:</span> <span class="n">loc</span><span class="p">,</span> <span class="s2">&quot;scope&quot;</span><span class="p">:</span> <span class="mi">0</span><span class="p">,</span> <span class="p">}</span> <span class="k">def</span> <span class="nf">register_local</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">local</span><span class="p">:</span> <span class="nb">str</span> <span class="o">=</span> <span class="s2">&quot;tmp&quot;</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">str</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">counter</span> <span class="o">+=</span> <span class="mi">1</span> <span class="bp">self</span><span class="o">.</span><span class="n">namings</span><span class="p">[</span><span class="n">local</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="s2">&quot;name&quot;</span><span class="p">:</span> <span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">local</span><span class="si">}</span><span class="s2">_</span><span class="si">{</span><span class="bp">self</span><span class="o">.</span><span class="n">counter</span><span class="si">}</span><span class="s2">&quot;</span><span class="p">,</span> <span class="c1"># naming dictionary is copied, so we need to capture scope</span> <span class="c1"># at declaration</span> <span class="s2">&quot;scope&quot;</span><span class="p">:</span> <span class="bp">self</span><span class="o">.</span><span class="n">scope</span><span class="p">,</span> <span class="p">}</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">namings</span><span class="p">[</span><span class="n">local</span><span class="p">][</span><span class="s2">&quot;name&quot;</span><span class="p">]</span> <span class="k">def</span> <span class="nf">copy</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="n">new</span> <span class="o">=</span> <span class="n">copy</span><span class="o">.</span><span class="n">copy</span><span class="p">(</span><span class="bp">self</span><span class="p">)</span> <span class="c1"># For some reason copy.deepcopy doesn&#39;t do this</span> <span class="n">new</span><span class="o">.</span><span class="n">namings</span> <span class="o">=</span> <span class="nb">dict</span><span class="p">(</span><span class="n">new</span><span class="o">.</span><span class="n">namings</span><span class="p">)</span> <span class="k">return</span> <span class="n">new</span> <span class="k">def</span> <span class="nf">at_toplevel</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span> <span class="k">return</span> <span class="bp">self</span><span class="o">.</span><span class="n">scope</span> <span class="o">==</span> <span class="mi">0</span> </pre></div> <p>This is all pretty boring boilerplate. Let's move on.</p> <h3 id="pyc/<strong>main</strong>.py">pyc/<strong>main</strong>.py</h3><p>The entrypoint is responsible for reading source code, parsing it, calling the code generator, writing the source code to a C file, and compiling it.</p> <p>First, we read and parse the source code:</p> <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">ast</span> <span class="kn">import</span> <span class="nn">os</span> <span class="kn">import</span> <span class="nn">subprocess</span> <span class="kn">import</span> <span class="nn">shutil</span> <span class="kn">import</span> <span class="nn">sys</span> <span class="kn">from</span> <span class="nn">context</span> <span class="kn">import</span> <span class="n">Context</span> <span class="kn">from</span> <span class="nn">codegen</span> <span class="kn">import</span> <span class="n">generate</span> <span class="n">BUILTINS</span> <span class="o">=</span> <span class="p">{</span> <span class="s2">&quot;print&quot;</span><span class="p">:</span> <span class="s2">&quot;PYC_Print&quot;</span><span class="p">,</span> <span class="p">}</span> <span class="k">def</span> <span class="nf">main</span><span class="p">():</span> <span class="n">target</span> <span class="o">=</span> <span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">target</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span> <span class="n">source</span> <span class="o">=</span> <span class="n">f</span><span class="o">.</span><span class="n">read</span><span class="p">()</span> <span class="n">tree</span> <span class="o">=</span> <span class="n">ast</span><span class="o">.</span><span class="n">parse</span><span class="p">(</span><span class="n">source</span><span class="p">,</span> <span class="n">target</span><span class="p">)</span> </pre></div> <p>Then we write <code>libpyc.c</code> into the body, register builtins, and run code generation:</p> <div class="highlight"><pre><span></span><span class="o">...</span> <span class="k">def</span> <span class="nf">main</span><span class="p">()</span> <span class="o">...</span> <span class="n">ctx</span> <span class="o">=</span> <span class="n">Context</span><span class="p">()</span> <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s2">&quot;libpyc.c&quot;</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span> <span class="n">ctx</span><span class="o">.</span><span class="n">body_write</span><span class="p">(</span><span class="n">f</span><span class="o">.</span><span class="n">read</span><span class="p">()</span> <span class="o">+</span> <span class="s2">&quot;</span><span class="se">\n</span><span class="s2">&quot;</span><span class="p">)</span> <span class="k">for</span> <span class="n">builtin</span><span class="p">,</span> <span class="n">fn</span> <span class="ow">in</span> <span class="n">BUILTINS</span><span class="o">.</span><span class="n">items</span><span class="p">():</span> <span class="n">ctx</span><span class="o">.</span><span class="n">register_global</span><span class="p">(</span><span class="n">builtin</span><span class="p">,</span> <span class="n">fn</span><span class="p">)</span> <span class="n">generate</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">tree</span><span class="p">)</span> </pre></div> <p>Next, we create a clean output directory and write <code>main.c</code> with the generated code and a <code>main</code> function to initialization Python and any global variables:</p> <div class="highlight"><pre><span></span><span class="o">...</span> <span class="k">def</span> <span class="nf">main</span><span class="p">():</span> <span class="o">...</span> <span class="c1"># Create and move to working directory</span> <span class="n">outdir</span> <span class="o">=</span> <span class="s2">&quot;bin&quot;</span> <span class="n">shutil</span><span class="o">.</span><span class="n">rmtree</span><span class="p">(</span><span class="n">outdir</span><span class="p">,</span> <span class="n">ignore_errors</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span> <span class="n">os</span><span class="o">.</span><span class="n">mkdir</span><span class="p">(</span><span class="n">outdir</span><span class="p">)</span> <span class="n">os</span><span class="o">.</span><span class="n">chdir</span><span class="p">(</span><span class="n">outdir</span><span class="p">)</span> <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s2">&quot;main.c&quot;</span><span class="p">,</span> <span class="s2">&quot;w&quot;</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span> <span class="n">f</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">ctx</span><span class="o">.</span><span class="n">body</span><span class="o">.</span><span class="n">content</span><span class="p">)</span> <span class="n">main</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">.</span><span class="n">namings</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">&quot;main&quot;</span><span class="p">)[</span><span class="s2">&quot;name&quot;</span><span class="p">]</span> <span class="n">f</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;&quot;&quot;int main(int argc, char *argv[]) </span><span class="se">{{</span> <span class="s2"> Py_Initialize();</span> <span class="s2"> // Initialize globals, if any.</span> <span class="si">{</span><span class="n">ctx</span><span class="o">.</span><span class="n">initializations</span><span class="o">.</span><span class="n">content</span><span class="si">}</span> <span class="s2"> PyObject* r = </span><span class="si">{</span><span class="n">main</span><span class="si">}</span><span class="s2">();</span> <span class="s2"> return PyLong_AsLong(r);</span> <span class="se">}}</span><span class="s2">&quot;&quot;&quot;</span><span class="p">)</span> </pre></div> <p>Finally, we run <code>clang-format</code> and <code>gcc</code> against the generated C code:</p> <div class="highlight"><pre><span></span><span class="o">...</span> <span class="k">def</span> <span class="nf">main</span><span class="p">():</span> <span class="o">...</span> <span class="n">subprocess</span><span class="o">.</span><span class="n">run</span><span class="p">([</span><span class="s2">&quot;clang-format&quot;</span><span class="p">,</span> <span class="s2">&quot;-i&quot;</span><span class="p">,</span> <span class="s2">&quot;main.c&quot;</span><span class="p">])</span> <span class="n">cflags_raw</span> <span class="o">=</span> <span class="n">subprocess</span><span class="o">.</span><span class="n">check_output</span><span class="p">([</span><span class="s2">&quot;python3-config&quot;</span><span class="p">,</span> <span class="s2">&quot;--cflags&quot;</span><span class="p">])</span> <span class="n">cflags</span> <span class="o">=</span> <span class="p">[</span><span class="n">f</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span> <span class="k">for</span> <span class="n">f</span> <span class="ow">in</span> <span class="n">cflags_raw</span><span class="o">.</span><span class="n">decode</span><span class="p">()</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">&quot; &quot;</span><span class="p">)</span> <span class="k">if</span> <span class="n">f</span><span class="o">.</span><span class="n">strip</span><span class="p">()]</span> <span class="n">cmd</span> <span class="o">=</span> <span class="p">[</span><span class="s2">&quot;gcc&quot;</span><span class="p">,</span> <span class="s2">&quot;-c&quot;</span><span class="p">,</span> <span class="s2">&quot;-o&quot;</span><span class="p">,</span> <span class="s2">&quot;main.o&quot;</span><span class="p">]</span> <span class="o">+</span> <span class="n">cflags</span> <span class="o">+</span> <span class="p">[</span><span class="s2">&quot;main.c&quot;</span><span class="p">]</span> <span class="n">subprocess</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">cmd</span><span class="p">)</span> <span class="n">ldflags_raw</span> <span class="o">=</span> <span class="n">subprocess</span><span class="o">.</span><span class="n">check_output</span><span class="p">([</span><span class="s2">&quot;python3-config&quot;</span><span class="p">,</span> <span class="s2">&quot;--ldflags&quot;</span><span class="p">])</span> <span class="n">ldflags</span> <span class="o">=</span> <span class="p">[</span><span class="n">f</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span> <span class="k">for</span> <span class="n">f</span> <span class="ow">in</span> <span class="n">ldflags_raw</span><span class="o">.</span><span class="n">decode</span><span class="p">()</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">&quot; &quot;</span><span class="p">)</span> <span class="k">if</span> <span class="n">f</span><span class="o">.</span><span class="n">strip</span><span class="p">()]</span> <span class="n">cmd</span> <span class="o">=</span> <span class="p">[</span><span class="s2">&quot;gcc&quot;</span><span class="p">]</span> <span class="o">+</span> <span class="n">ldflags</span> <span class="o">+</span> <span class="p">[</span><span class="s2">&quot;main.o&quot;</span><span class="p">]</span> <span class="n">subprocess</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">cmd</span><span class="p">)</span> </pre></div> <p>All together:</p> <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">ast</span> <span class="kn">import</span> <span class="nn">os</span> <span class="kn">import</span> <span class="nn">subprocess</span> <span class="kn">import</span> <span class="nn">shutil</span> <span class="kn">import</span> <span class="nn">sys</span> <span class="kn">from</span> <span class="nn">context</span> <span class="kn">import</span> <span class="n">Context</span> <span class="kn">from</span> <span class="nn">codegen</span> <span class="kn">import</span> <span class="n">generate</span> <span class="n">BUILTINS</span> <span class="o">=</span> <span class="p">{</span> <span class="s2">&quot;print&quot;</span><span class="p">:</span> <span class="s2">&quot;PYC_Print&quot;</span><span class="p">,</span> <span class="p">}</span> <span class="k">def</span> <span class="nf">main</span><span class="p">():</span> <span class="n">target</span> <span class="o">=</span> <span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="n">target</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span> <span class="n">source</span> <span class="o">=</span> <span class="n">f</span><span class="o">.</span><span class="n">read</span><span class="p">()</span> <span class="n">tree</span> <span class="o">=</span> <span class="n">ast</span><span class="o">.</span><span class="n">parse</span><span class="p">(</span><span class="n">source</span><span class="p">,</span> <span class="n">target</span><span class="p">)</span> <span class="n">ctx</span> <span class="o">=</span> <span class="n">Context</span><span class="p">()</span> <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s2">&quot;libpyc.c&quot;</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span> <span class="n">ctx</span><span class="o">.</span><span class="n">body_write</span><span class="p">(</span><span class="n">f</span><span class="o">.</span><span class="n">read</span><span class="p">()</span> <span class="o">+</span> <span class="s2">&quot;</span><span class="se">\n</span><span class="s2">&quot;</span><span class="p">)</span> <span class="k">for</span> <span class="n">builtin</span><span class="p">,</span> <span class="n">fn</span> <span class="ow">in</span> <span class="n">BUILTINS</span><span class="o">.</span><span class="n">items</span><span class="p">():</span> <span class="n">ctx</span><span class="o">.</span><span class="n">register_global</span><span class="p">(</span><span class="n">builtin</span><span class="p">,</span> <span class="n">fn</span><span class="p">)</span> <span class="n">generate</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">tree</span><span class="p">)</span> <span class="c1"># Create and move to working directory</span> <span class="n">outdir</span> <span class="o">=</span> <span class="s2">&quot;bin&quot;</span> <span class="n">shutil</span><span class="o">.</span><span class="n">rmtree</span><span class="p">(</span><span class="n">outdir</span><span class="p">,</span> <span class="n">ignore_errors</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span> <span class="n">os</span><span class="o">.</span><span class="n">mkdir</span><span class="p">(</span><span class="n">outdir</span><span class="p">)</span> <span class="n">os</span><span class="o">.</span><span class="n">chdir</span><span class="p">(</span><span class="n">outdir</span><span class="p">)</span> <span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s2">&quot;main.c&quot;</span><span class="p">,</span> <span class="s2">&quot;w&quot;</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span> <span class="n">f</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">ctx</span><span class="o">.</span><span class="n">body</span><span class="o">.</span><span class="n">content</span><span class="p">)</span> <span class="n">main</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">.</span><span class="n">namings</span><span class="o">.</span><span class="n">get</span><span class="p">(</span><span class="s2">&quot;main&quot;</span><span class="p">)[</span><span class="s2">&quot;name&quot;</span><span class="p">]</span> <span class="n">f</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;&quot;&quot;int main(int argc, char *argv[]) </span><span class="se">{{</span> <span class="s2"> Py_Initialize();</span> <span class="s2"> // Initialize globals, if any.</span> <span class="si">{</span><span class="n">ctx</span><span class="o">.</span><span class="n">initializations</span><span class="o">.</span><span class="n">content</span><span class="si">}</span> <span class="s2"> PyObject* r = </span><span class="si">{</span><span class="n">main</span><span class="si">}</span><span class="s2">();</span> <span class="s2"> return PyLong_AsLong(r);</span> <span class="se">}}</span><span class="s2">&quot;&quot;&quot;</span><span class="p">)</span> <span class="n">subprocess</span><span class="o">.</span><span class="n">run</span><span class="p">([</span><span class="s2">&quot;clang-format&quot;</span><span class="p">,</span> <span class="s2">&quot;-i&quot;</span><span class="p">,</span> <span class="s2">&quot;main.c&quot;</span><span class="p">])</span> <span class="n">cflags_raw</span> <span class="o">=</span> <span class="n">subprocess</span><span class="o">.</span><span class="n">check_output</span><span class="p">([</span><span class="s2">&quot;python3-config&quot;</span><span class="p">,</span> <span class="s2">&quot;--cflags&quot;</span><span class="p">])</span> <span class="n">cflags</span> <span class="o">=</span> <span class="p">[</span><span class="n">f</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span> <span class="k">for</span> <span class="n">f</span> <span class="ow">in</span> <span class="n">cflags_raw</span><span class="o">.</span><span class="n">decode</span><span class="p">()</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">&quot; &quot;</span><span class="p">)</span> <span class="k">if</span> <span class="n">f</span><span class="o">.</span><span class="n">strip</span><span class="p">()]</span> <span class="n">cmd</span> <span class="o">=</span> <span class="p">[</span><span class="s2">&quot;gcc&quot;</span><span class="p">,</span> <span class="s2">&quot;-c&quot;</span><span class="p">,</span> <span class="s2">&quot;-o&quot;</span><span class="p">,</span> <span class="s2">&quot;main.o&quot;</span><span class="p">]</span> <span class="o">+</span> <span class="n">cflags</span> <span class="o">+</span> <span class="p">[</span><span class="s2">&quot;main.c&quot;</span><span class="p">]</span> <span class="n">subprocess</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">cmd</span><span class="p">)</span> <span class="n">ldflags_raw</span> <span class="o">=</span> <span class="n">subprocess</span><span class="o">.</span><span class="n">check_output</span><span class="p">([</span><span class="s2">&quot;python3-config&quot;</span><span class="p">,</span> <span class="s2">&quot;--ldflags&quot;</span><span class="p">])</span> <span class="n">ldflags</span> <span class="o">=</span> <span class="p">[</span><span class="n">f</span><span class="o">.</span><span class="n">strip</span><span class="p">()</span> <span class="k">for</span> <span class="n">f</span> <span class="ow">in</span> <span class="n">ldflags_raw</span><span class="o">.</span><span class="n">decode</span><span class="p">()</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s2">&quot; &quot;</span><span class="p">)</span> <span class="k">if</span> <span class="n">f</span><span class="o">.</span><span class="n">strip</span><span class="p">()]</span> <span class="n">cmd</span> <span class="o">=</span> <span class="p">[</span><span class="s2">&quot;gcc&quot;</span><span class="p">]</span> <span class="o">+</span> <span class="n">ldflags</span> <span class="o">+</span> <span class="p">[</span><span class="s2">&quot;main.o&quot;</span><span class="p">]</span> <span class="n">subprocess</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">cmd</span><span class="p">)</span> <span class="n">main</span><span class="p">()</span> </pre></div> <p>Done!</p> <h3 id="pyc/codegen.py">pyc/codegen.py</h3><p>Lastly we write the translation layer from Python AST to C. We'll break this out into 10 helper functions. It is helpful to have the <a href="https://docs.python.org/3/library/ast.html#abstract-grammar">AST spec</a> for reference.</p> <h4 id="1/10:-generate">1/10: generate</h4><p>The entrypoint of the code generator is <code>generate(ctx: Context, exp)</code>. It generates code for any object with a <code>body</code> attribute storing a list of statements. This function will generate code for objects like modules, function bodies, if bodies, etc.</p> <p>The statements we'll support to begin are:</p> <ul> <li><code>ast.Assign</code></li> <li><code>ast.FunctionDef</code></li> <li><code>ast.Return</code></li> <li><code>ast.If</code></li> <li>and <code>ast.Expr</code></li> </ul> <p>For each statement, we'll simply pass on generation to an associated helper function. In the case of expression generation though, we'll also add a noop operation on the result of the expression otherwise the compiler will complain about an unused variable.</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">generate</span><span class="p">(</span><span class="n">ctx</span><span class="p">:</span> <span class="n">Context</span><span class="p">,</span> <span class="n">module</span><span class="p">):</span> <span class="k">for</span> <span class="n">stmt</span> <span class="ow">in</span> <span class="n">module</span><span class="o">.</span><span class="n">body</span><span class="p">:</span> <span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">stmt</span><span class="p">,</span> <span class="n">ast</span><span class="o">.</span><span class="n">Assign</span><span class="p">):</span> <span class="n">generate_assign</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">stmt</span><span class="p">)</span> <span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">stmt</span><span class="p">,</span> <span class="n">ast</span><span class="o">.</span><span class="n">FunctionDef</span><span class="p">):</span> <span class="n">generate_function_def</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">stmt</span><span class="p">)</span> <span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">stmt</span><span class="p">,</span> <span class="n">ast</span><span class="o">.</span><span class="n">Return</span><span class="p">):</span> <span class="n">generate_return</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">stmt</span><span class="p">)</span> <span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">stmt</span><span class="p">,</span> <span class="n">ast</span><span class="o">.</span><span class="n">If</span><span class="p">):</span> <span class="n">generate_if</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">stmt</span><span class="p">)</span> <span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">stmt</span><span class="p">,</span> <span class="n">ast</span><span class="o">.</span><span class="n">Expr</span><span class="p">):</span> <span class="n">r</span> <span class="o">=</span> <span class="n">generate_expression</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">stmt</span><span class="o">.</span><span class="n">value</span><span class="p">)</span> <span class="n">ctx</span><span class="o">.</span><span class="n">body_writeln</span><span class="p">(</span><span class="s2">&quot;// noop to hide unused warning&quot;</span><span class="p">)</span> <span class="n">ctx</span><span class="o">.</span><span class="n">body_write_statement</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">r</span><span class="si">}</span><span class="s2"> += 0&quot;</span><span class="p">)</span> <span class="k">else</span><span class="p">:</span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;Unsupported statement type: </span><span class="si">{</span><span class="nb">type</span><span class="p">(</span><span class="n">stmt</span><span class="p">)</span><span class="si">}</span><span class="s2">&quot;</span><span class="p">)</span> </pre></div> <p class="note"> Remember to throw exceptions aggressively otherwise you'll have a bad time debugging programs using new syntax. </p><p>Let's dig into these helpers.</p> <h4 id="2/10:-generate_assign">2/10: generate_assign</h4><p>To generate assignment code, we need to check if we're at the top-level or not. If we're at the top-level we can declare the variable but we can't initialize it yet. So we add the initialization code to the <code>initialization</code> section of the program.</p> <p>If we're not at the top-level, we can declare and assign in one statement.</p> <p>Before doing either though, we register the variable name so we can get a safe local name to use in generated code. Then we compile the right-hand side so we can assign it to the left-hand side.</p> <div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">ast</span> <span class="kn">from</span> <span class="nn">context</span> <span class="kn">import</span> <span class="n">Context</span> <span class="k">def</span> <span class="nf">initialize_variable</span><span class="p">(</span><span class="n">ctx</span><span class="p">:</span> <span class="n">Context</span><span class="p">,</span> <span class="n">name</span><span class="p">:</span> <span class="nb">str</span><span class="p">,</span> <span class="n">val</span><span class="p">:</span> <span class="nb">str</span><span class="p">):</span> <span class="k">if</span> <span class="n">ctx</span><span class="o">.</span><span class="n">at_toplevel</span><span class="p">():</span> <span class="n">decl</span> <span class="o">=</span> <span class="sa">f</span><span class="s2">&quot;PyObject* </span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s2">&quot;</span> <span class="n">ctx</span><span class="o">.</span><span class="n">body_write_statement</span><span class="p">(</span><span class="n">decl</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span> <span class="n">init</span> <span class="o">=</span> <span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s2"> = </span><span class="si">{</span><span class="n">val</span><span class="si">}</span><span class="s2">&quot;</span> <span class="n">ctx</span><span class="o">.</span><span class="n">initializations_write_statement</span><span class="p">(</span><span class="n">init</span><span class="p">)</span> <span class="k">else</span><span class="p">:</span> <span class="n">ctx</span><span class="o">.</span><span class="n">body_write_statement</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;PyObject* </span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s2"> = </span><span class="si">{</span><span class="n">val</span><span class="si">}</span><span class="s2">&quot;</span><span class="p">)</span> <span class="k">def</span> <span class="nf">generate_assign</span><span class="p">(</span><span class="n">ctx</span><span class="p">:</span> <span class="n">Context</span><span class="p">,</span> <span class="n">stmt</span><span class="p">:</span> <span class="n">ast</span><span class="o">.</span><span class="n">Assign</span><span class="p">):</span> <span class="c1"># TODO: support assigning to a tuple</span> <span class="n">local</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">.</span><span class="n">register_local</span><span class="p">(</span><span class="n">stmt</span><span class="o">.</span><span class="n">targets</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">.</span><span class="n">id</span><span class="p">)</span> <span class="n">val</span> <span class="o">=</span> <span class="n">generate_expression</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">stmt</span><span class="o">.</span><span class="n">value</span><span class="p">)</span> <span class="n">initialize_variable</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">local</span><span class="p">,</span> <span class="n">val</span><span class="p">)</span> </pre></div> <p>We're going to need to implement <code>generate_expression</code> to make this work.</p> <h4 id="3/10:-generate_expression">3/10: generate_expression</h4><p>Just like for statements in <code>generate</code>, there are a few kinds of expressions we need to implement:</p> <ul> <li><code>ast.Num</code></li> <li><code>ast.BinOp</code></li> <li><code>ast.BoolOp</code></li> <li><code>ast.Name</code></li> <li><code>ast.Compare</code></li> <li>and <code>ast.Call</code></li> </ul> <p>For <code>ast.Num</code>, we just need to wrap the literal number as a <code>PyLong*</code>. And for <code>ast.Name</code> we just need to look up the local name in context. Otherwise we delegate to more helper functions.</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">generate_expression</span><span class="p">(</span><span class="n">ctx</span><span class="p">:</span> <span class="n">Context</span><span class="p">,</span> <span class="n">exp</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">str</span><span class="p">:</span> <span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">exp</span><span class="p">,</span> <span class="n">ast</span><span class="o">.</span><span class="n">Num</span><span class="p">):</span> <span class="c1"># TODO: deal with non-integers</span> <span class="n">tmp</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">.</span><span class="n">register_local</span><span class="p">(</span><span class="s2">&quot;num&quot;</span><span class="p">)</span> <span class="n">initialize_variable</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">tmp</span><span class="p">,</span> <span class="sa">f</span><span class="s2">&quot;PyLong_FromLong(</span><span class="si">{</span><span class="n">exp</span><span class="o">.</span><span class="n">n</span><span class="si">}</span><span class="s2">)&quot;</span><span class="p">)</span> <span class="k">return</span> <span class="n">tmp</span> <span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">exp</span><span class="p">,</span> <span class="n">ast</span><span class="o">.</span><span class="n">BinOp</span><span class="p">):</span> <span class="k">return</span> <span class="n">generate_bin_op</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">exp</span><span class="p">)</span> <span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">exp</span><span class="p">,</span> <span class="n">ast</span><span class="o">.</span><span class="n">BoolOp</span><span class="p">):</span> <span class="k">return</span> <span class="n">generate_bool_op</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">exp</span><span class="p">)</span> <span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">exp</span><span class="p">,</span> <span class="n">ast</span><span class="o">.</span><span class="n">Name</span><span class="p">):</span> <span class="k">return</span> <span class="n">ctx</span><span class="o">.</span><span class="n">get_local</span><span class="p">(</span><span class="n">exp</span><span class="o">.</span><span class="n">id</span><span class="p">)[</span><span class="s2">&quot;name&quot;</span><span class="p">]</span> <span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">exp</span><span class="p">,</span> <span class="n">ast</span><span class="o">.</span><span class="n">Compare</span><span class="p">):</span> <span class="k">return</span> <span class="n">generate_compare</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">exp</span><span class="p">)</span> <span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">exp</span><span class="p">,</span> <span class="n">ast</span><span class="o">.</span><span class="n">Call</span><span class="p">):</span> <span class="k">return</span> <span class="n">generate_call</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">exp</span><span class="p">)</span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;Unsupported expression: </span><span class="si">{</span><span class="nb">type</span><span class="p">(</span><span class="n">exp</span><span class="p">)</span><span class="si">}</span><span class="s2">&quot;</span><span class="p">)</span> </pre></div> <p>For every code generation helper that is an expression, we store the expression in a local variable and return the variable's name so that parent nodes in the AST can refer to the child. This can result in inefficient code generation (useless assignment) but that's not really a big deal for a project like this and will likely be optimized away by GCC anyway. The more annoying aspect is that useless assignment just makes the generated code harder to read.</p> <h4 id="4/10:-generate_bin_op">4/10: generate_bin_op</h4><p>For binary operators we need to support addition and subtraction. Other binary operators like equality or and/or are represented in <code>ast.Compare</code> and <code>ast.BoolOp</code>.</p> <p>This is easy to write because we already prepared helpers in <code>libpyc.c</code>: <code>PYC_Sub</code> and <code>PYC_Add</code>.</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">generate_bin_op</span><span class="p">(</span><span class="n">ctx</span><span class="p">:</span> <span class="n">Context</span><span class="p">,</span> <span class="n">binop</span><span class="p">:</span> <span class="n">ast</span><span class="o">.</span><span class="n">BinOp</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">str</span><span class="p">:</span> <span class="n">result</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">.</span><span class="n">register_local</span><span class="p">(</span><span class="s2">&quot;binop&quot;</span><span class="p">)</span> <span class="n">l</span> <span class="o">=</span> <span class="n">generate_expression</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">binop</span><span class="o">.</span><span class="n">left</span><span class="p">)</span> <span class="n">r</span> <span class="o">=</span> <span class="n">generate_expression</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">binop</span><span class="o">.</span><span class="n">right</span><span class="p">)</span> <span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">binop</span><span class="o">.</span><span class="n">op</span><span class="p">,</span> <span class="n">ast</span><span class="o">.</span><span class="n">Add</span><span class="p">):</span> <span class="n">ctx</span><span class="o">.</span><span class="n">body_write_statement</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;PyObject* </span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="s2"> = PYC_Add(</span><span class="si">{</span><span class="n">l</span><span class="si">}</span><span class="s2">, </span><span class="si">{</span><span class="n">r</span><span class="si">}</span><span class="s2">)&quot;</span><span class="p">)</span> <span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">binop</span><span class="o">.</span><span class="n">op</span><span class="p">,</span> <span class="n">ast</span><span class="o">.</span><span class="n">Sub</span><span class="p">):</span> <span class="n">ctx</span><span class="o">.</span><span class="n">body_write_statement</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;PyObject* </span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="s2"> = PYC_Sub(</span><span class="si">{</span><span class="n">l</span><span class="si">}</span><span class="s2">, </span><span class="si">{</span><span class="n">r</span><span class="si">}</span><span class="s2">)&quot;</span><span class="p">)</span> <span class="k">else</span><span class="p">:</span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;Unsupported binary operator: </span><span class="si">{</span><span class="nb">type</span><span class="p">(</span><span class="n">binop</span><span class="o">.</span><span class="n">op</span><span class="p">)</span><span class="si">}</span><span class="s2">&quot;</span><span class="p">)</span> <span class="k">return</span> <span class="n">result</span> </pre></div> <p>Easy enough.</p> <h4 id="5/10:-generate_bool_op">5/10: generate_bool_op</h4><p>We only need to support <code>or</code> for the fibonacci program, but <code>or</code> in Python is more complicated than in C. In Python, the first value to be truthy short-circuits the expression and the result is its value, not <code>True</code>.</p> <p>We'll use <code>goto</code> to short-circuit and we'll use <a href="https://docs.python.org/3/c-api/object.html#c.PyObject_IsTrue">PyObject_IsTrue</a> to do the truthy check:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">generate_bool_op</span><span class="p">(</span><span class="n">ctx</span><span class="p">:</span> <span class="n">Context</span><span class="p">,</span> <span class="n">boolop</span><span class="p">:</span> <span class="n">ast</span><span class="o">.</span><span class="n">BoolOp</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">str</span><span class="p">:</span> <span class="n">result</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">.</span><span class="n">register_local</span><span class="p">(</span><span class="s2">&quot;boolop&quot;</span><span class="p">)</span> <span class="n">ctx</span><span class="o">.</span><span class="n">body_write_statement</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;PyObject* </span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="s2">&quot;</span><span class="p">)</span> <span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">boolop</span><span class="o">.</span><span class="n">op</span><span class="p">,</span> <span class="n">ast</span><span class="o">.</span><span class="n">Or</span><span class="p">):</span> <span class="n">done_or</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">.</span><span class="n">register_local</span><span class="p">(</span><span class="s2">&quot;done_or&quot;</span><span class="p">)</span> <span class="k">for</span> <span class="n">exp</span> <span class="ow">in</span> <span class="n">boolop</span><span class="o">.</span><span class="n">values</span><span class="p">:</span> <span class="n">v</span> <span class="o">=</span> <span class="n">generate_expression</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">exp</span><span class="p">)</span> <span class="n">ctx</span><span class="o">.</span><span class="n">body_write_statement</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="s2"> = </span><span class="si">{</span><span class="n">v</span><span class="si">}</span><span class="s2">&quot;</span><span class="p">)</span> <span class="n">ctx</span><span class="o">.</span><span class="n">body_writeln</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;if (PyObject_IsTrue(</span><span class="si">{</span><span class="n">v</span><span class="si">}</span><span class="s2">)) </span><span class="se">{{</span><span class="s2">&quot;</span><span class="p">)</span> <span class="n">ctx</span><span class="o">.</span><span class="n">body_write_statement</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;goto </span><span class="si">{</span><span class="n">done_or</span><span class="si">}</span><span class="s2">&quot;</span><span class="p">,</span> <span class="n">ctx</span><span class="o">.</span><span class="n">indentation</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span> <span class="n">ctx</span><span class="o">.</span><span class="n">body_writeln</span><span class="p">(</span><span class="s2">&quot;}&quot;</span><span class="p">)</span> <span class="n">ctx</span><span class="o">.</span><span class="n">body_writeln</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">done_or</span><span class="si">}</span><span class="s2">:</span><span class="se">\n</span><span class="s2">&quot;</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span> <span class="k">return</span> <span class="n">result</span> </pre></div> <p class="note"> Now that I write this down I see we could probably move this function into <code>libpyc.c</code> if we used a loop. Maybe in the next iteration. </p><p>We move on.</p> <h4 id="6/10:-generate_compare">6/10: generate_compare</h4><p>This function handles equality and inequality checks. We'll adapt the <code>PyObject_RichCompareBool</code> helper we used in the hand-written translation.</p> <p>The only additional thing to keep in mind is that the right-hand side is passed as an array. So we have to iterate through it and apply the equality/inequality check on all objects in the list.</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">generate_compare</span><span class="p">(</span><span class="n">ctx</span><span class="p">:</span> <span class="n">Context</span><span class="p">,</span> <span class="n">exp</span><span class="p">:</span> <span class="n">ast</span><span class="o">.</span><span class="n">Compare</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">str</span><span class="p">:</span> <span class="n">result</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">.</span><span class="n">register_local</span><span class="p">(</span><span class="s2">&quot;compare&quot;</span><span class="p">)</span> <span class="n">left</span> <span class="o">=</span> <span class="n">generate_expression</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">exp</span><span class="o">.</span><span class="n">left</span><span class="p">)</span> <span class="n">ctx</span><span class="o">.</span><span class="n">body_write_statement</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;PyObject* </span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="s2"> = </span><span class="si">{</span><span class="n">left</span><span class="si">}</span><span class="s2">&quot;</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span><span class="p">,</span> <span class="n">op</span> <span class="ow">in</span> <span class="nb">enumerate</span><span class="p">(</span><span class="n">exp</span><span class="o">.</span><span class="n">ops</span><span class="p">):</span> <span class="n">v</span> <span class="o">=</span> <span class="n">generate_expression</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">exp</span><span class="o">.</span><span class="n">comparators</span><span class="p">[</span><span class="n">i</span><span class="p">])</span> <span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">op</span><span class="p">,</span> <span class="n">ast</span><span class="o">.</span><span class="n">Eq</span><span class="p">):</span> <span class="n">ctx</span><span class="o">.</span><span class="n">body_write_statement</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="s2"> = PyObject_RichCompare(</span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="s2">, </span><span class="si">{</span><span class="n">v</span><span class="si">}</span><span class="s2">, Py_EQ)&quot;</span><span class="p">)</span> <span class="k">elif</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">op</span><span class="p">,</span> <span class="n">ast</span><span class="o">.</span><span class="n">NotEq</span><span class="p">):</span> <span class="n">ctx</span><span class="o">.</span><span class="n">body_write_statement</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;</span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="s2"> = PyObject_RichCompare(</span><span class="si">{</span><span class="n">result</span><span class="si">}</span><span class="s2">, </span><span class="si">{</span><span class="n">v</span><span class="si">}</span><span class="s2">, Py_NE)&quot;</span><span class="p">)</span> <span class="k">else</span><span class="p">:</span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;Unsupported comparison: </span><span class="si">{</span><span class="nb">type</span><span class="p">(</span><span class="n">op</span><span class="p">)</span><span class="si">}</span><span class="s2">&quot;</span><span class="p">)</span> <span class="k">return</span> <span class="n">result</span> </pre></div> <h4 id="7/10:-generate_call">7/10: generate_call</h4><p>The last expression is simple enough. We compile the call's arguments first, then the function itself, then we call the function with the arguments like any C function. Calling the C function directly will have ramifications for interacting with Python libraries (basically, we won't be able to interact with any) but it's the easiest way to get started.</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">generate_call</span><span class="p">(</span><span class="n">ctx</span><span class="p">:</span> <span class="n">Context</span><span class="p">,</span> <span class="n">exp</span><span class="p">:</span> <span class="n">ast</span><span class="o">.</span><span class="n">Call</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">str</span><span class="p">:</span> <span class="n">args</span> <span class="o">=</span> <span class="s1">&#39;, &#39;</span><span class="o">.</span><span class="n">join</span><span class="p">([</span><span class="n">generate_expression</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">a</span><span class="p">)</span> <span class="k">for</span> <span class="n">a</span> <span class="ow">in</span> <span class="n">exp</span><span class="o">.</span><span class="n">args</span><span class="p">])</span> <span class="n">fun</span> <span class="o">=</span> <span class="n">generate_expression</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">exp</span><span class="o">.</span><span class="n">func</span><span class="p">)</span> <span class="n">res</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">.</span><span class="n">register_local</span><span class="p">(</span><span class="s2">&quot;call_result&quot;</span><span class="p">)</span> <span class="c1"># TODO: lambdas and closures need additional work</span> <span class="n">ctx</span><span class="o">.</span><span class="n">body_write_statement</span><span class="p">(</span> <span class="sa">f</span><span class="s2">&quot;PyObject* </span><span class="si">{</span><span class="n">res</span><span class="si">}</span><span class="s2"> = </span><span class="si">{</span><span class="n">fun</span><span class="si">}</span><span class="s2">(</span><span class="si">{</span><span class="n">args</span><span class="si">}</span><span class="s2">)&quot;</span><span class="p">)</span> <span class="k">return</span> <span class="n">res</span> </pre></div> <p>And that's it for expressions! Just a few more statement helpers to support.</p> <h4 id="8/10:-generate_function_def">8/10: generate_function_def</h4><p>This is a fun one. First we register the function name in scope. Then we copy the context so variables within the function body are contained within the function body. We increment <code>scope</code> so we know we've left the top-level. Finally, we compile the body.</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">generate_function_def</span><span class="p">(</span><span class="n">ctx</span><span class="p">:</span> <span class="n">Context</span><span class="p">,</span> <span class="n">fd</span><span class="p">:</span> <span class="n">ast</span><span class="o">.</span><span class="n">FunctionDef</span><span class="p">):</span> <span class="n">name</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">.</span><span class="n">register_local</span><span class="p">(</span><span class="n">fd</span><span class="o">.</span><span class="n">name</span><span class="p">)</span> <span class="n">childCtx</span> <span class="o">=</span> <span class="n">ctx</span><span class="o">.</span><span class="n">copy</span><span class="p">()</span> <span class="n">args</span> <span class="o">=</span> <span class="s2">&quot;, &quot;</span><span class="o">.</span><span class="n">join</span><span class="p">([</span><span class="sa">f</span><span class="s2">&quot;PyObject* </span><span class="si">{</span><span class="n">childCtx</span><span class="o">.</span><span class="n">register_local</span><span class="p">(</span><span class="n">a</span><span class="o">.</span><span class="n">arg</span><span class="p">)</span><span class="si">}</span><span class="s2">&quot;</span> <span class="k">for</span> <span class="n">a</span> <span class="ow">in</span> <span class="n">fd</span><span class="o">.</span><span class="n">args</span><span class="o">.</span><span class="n">args</span><span class="p">])</span> <span class="n">ctx</span><span class="o">.</span><span class="n">body_writeln</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;PyObject* </span><span class="si">{</span><span class="n">name</span><span class="si">}</span><span class="s2">(</span><span class="si">{</span><span class="n">args</span><span class="si">}</span><span class="s2">) </span><span class="se">{{</span><span class="s2">&quot;</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span> <span class="n">childCtx</span><span class="o">.</span><span class="n">scope</span> <span class="o">+=</span> <span class="mi">1</span> <span class="n">childCtx</span><span class="o">.</span><span class="n">indentation</span> <span class="o">+=</span> <span class="mi">1</span> <span class="n">generate</span><span class="p">(</span><span class="n">childCtx</span><span class="p">,</span> <span class="n">fd</span><span class="p">)</span> <span class="k">if</span> <span class="ow">not</span> <span class="n">childCtx</span><span class="o">.</span><span class="n">ret</span><span class="p">:</span> <span class="n">childCtx</span><span class="o">.</span><span class="n">body_write_statement</span><span class="p">(</span><span class="s2">&quot;return Py_None&quot;</span><span class="p">)</span> <span class="n">ctx</span><span class="o">.</span><span class="n">body_writeln</span><span class="p">(</span><span class="s2">&quot;}</span><span class="se">\n</span><span class="s2">&quot;</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span> </pre></div> <p>The check for <code>childCtx.ret</code> isn't strictly necessary because we could just emit a return even if there already was one. Asking <code>generate_return</code> to set this attribute and having <code>generate_function_def</code> check it just makes the generate code a little prettier.</p> <h4 id="9/10:-generate_return">9/10: generate_return</h4><p>Very straightforward, we just compile the value to be returned and then we emit a <code>return</code> statement.</p> <p>We store the return value so that the function definition can know whether to add a <code>return PyNone</code> statement.</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">generate_return</span><span class="p">(</span><span class="n">ctx</span><span class="p">:</span> <span class="n">Context</span><span class="p">,</span> <span class="n">r</span><span class="p">:</span> <span class="n">ast</span><span class="o">.</span><span class="n">Return</span><span class="p">):</span> <span class="n">ctx</span><span class="o">.</span><span class="n">ret</span> <span class="o">=</span> <span class="n">generate_expression</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">r</span><span class="o">.</span><span class="n">value</span><span class="p">)</span> <span class="n">ctx</span><span class="o">.</span><span class="n">body_writeln</span><span class="p">(</span><span class="s2">&quot;&quot;</span><span class="p">)</span> <span class="n">ctx</span><span class="o">.</span><span class="n">body_write_statement</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;return </span><span class="si">{</span><span class="n">ctx</span><span class="o">.</span><span class="n">ret</span><span class="si">}</span><span class="s2">&quot;</span><span class="p">)</span> </pre></div> <p>And we've got one last statement to support!</p> <h4 id="10/10:-generate_if">10/10: generate_if</h4><p>You know the deal: compile the test and if the test is truthy, enter the compiled body. We'll deal with the else body another time.</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">generate_if</span><span class="p">(</span><span class="n">ctx</span><span class="p">:</span> <span class="n">Context</span><span class="p">,</span> <span class="n">exp</span><span class="p">:</span> <span class="n">ast</span><span class="o">.</span><span class="n">If</span><span class="p">):</span> <span class="n">test</span> <span class="o">=</span> <span class="n">generate_expression</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">exp</span><span class="o">.</span><span class="n">test</span><span class="p">)</span> <span class="n">ctx</span><span class="o">.</span><span class="n">body_writeln</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;if (PyObject_IsTrue(</span><span class="si">{</span><span class="n">test</span><span class="si">}</span><span class="s2">)) </span><span class="se">{{</span><span class="s2">&quot;</span><span class="p">)</span> <span class="n">ctx</span><span class="o">.</span><span class="n">indentation</span> <span class="o">+=</span> <span class="mi">1</span> <span class="n">generate</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">exp</span><span class="p">)</span> <span class="c1"># TODO: handle exp.orelse</span> <span class="n">ctx</span><span class="o">.</span><span class="n">indentation</span> <span class="o">-=</span> <span class="mi">1</span> <span class="n">ctx</span><span class="o">.</span><span class="n">body_writeln</span><span class="p">(</span><span class="s2">&quot;}</span><span class="se">\n</span><span class="s2">&quot;</span><span class="p">)</span> </pre></div> <p>And we're done the compiler!</p> <h3 id="trying-it-out">Trying it out</h3><p>As promised:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>tests/recursive_fib.py def<span class="w"> </span>fib<span class="o">(</span>n<span class="o">)</span>: <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nv">n</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">0</span><span class="w"> </span>or<span class="w"> </span><span class="nv">n</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">1</span>: <span class="w"> </span><span class="k">return</span><span class="w"> </span>n <span class="w"> </span><span class="k">return</span><span class="w"> </span>fib<span class="o">(</span>n<span class="w"> </span>-<span class="w"> </span><span class="m">1</span><span class="o">)</span><span class="w"> </span>+<span class="w"> </span>fib<span class="o">(</span>n<span class="w"> </span>-<span class="w"> </span><span class="m">2</span><span class="o">)</span> def<span class="w"> </span>main<span class="o">()</span>: <span class="w"> </span>print<span class="o">(</span>fib<span class="o">(</span><span class="m">40</span><span class="o">))</span> $<span class="w"> </span>python3<span class="w"> </span>pyc<span class="w"> </span>tests/recursive_fib.py $<span class="w"> </span>./bin/a.out <span class="m">102334155</span> </pre></div> <h4 id="microbenchmarking,-or-making-compiler-twitter-unhappy">Microbenchmarking, or making compiler Twitter unhappy</h4><p>Keep in mind this implementation does a small fraction of what CPython is doing.</p> <p>If you time the generated code:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>python3<span class="w"> </span>pyc<span class="w"> </span>tests/recursive_fib.py $<span class="w"> </span><span class="nb">time</span><span class="w"> </span>./bin/a.out <span class="m">102334155</span> ./bin/a.out<span class="w"> </span><span class="m">18</span>.69s<span class="w"> </span>user<span class="w"> </span><span class="m">0</span>.03s<span class="w"> </span>system<span class="w"> </span><span class="m">99</span>%<span class="w"> </span>cpu<span class="w"> </span><span class="m">18</span>.854<span class="w"> </span>total </pre></div> <p>And CPython (with <code>main()</code> append to the source):</p> <div class="highlight"><pre><span></span><span class="nb">time</span><span class="w"> </span>python3<span class="w"> </span>tests/recursive_fib.py <span class="m">102334155</span> python3<span class="w"> </span>tests/recursive_fib.py<span class="w"> </span><span class="m">76</span>.24s<span class="w"> </span>user<span class="w"> </span><span class="m">0</span>.11s<span class="w"> </span>system<span class="w"> </span><span class="m">99</span>%<span class="w"> </span>cpu<span class="w"> </span><span class="m">1</span>:16.81<span class="w"> </span>total </pre></div> <p>The only reason I mention this is because when I did a <a href="/compiling-dynamic-programming-languages.html#next-steps-with-jsc">similar compiler project for JavaScript targeting C++/libV8</a>, the generated code was about the same or a little slower in speed.</p> <p>I haven't gotten <em>that much</em> better at writing these compilers.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Latest post up, on writing a simple Python to C compiler (in Python).<a href="https://t.co/4kkji0XXbp">https://t.co/4kkji0XXbp</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1295134027335204865?ref_src=twsrc%5Etfw">August 16, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/writing-a-simple-python-compiler.htmlSun, 16 Aug 2020 00:00:00 +0000A single-node Kubernetes cluster without virtualization or a container registryhttp://notes.eatonphil.com/a-single-node-kubernetes-cluster-without-virtualization-or-a-container-registry.html<p>This post is a recipe for setting up a minimal Kubernetes cluster on Fedora without requiring virtualization or a container registry. These two features make the system cloud-agnostic and the cluster entirely self-contained. The post will end with us running a simple Flask app from a local container.</p> <p>This setup is primarily useful for simple CI environments or application development on Linux. (Docker Desktop has better tooling for development on Mac or Windows.)</p> <h3 id="getting-kubernetes">Getting Kubernetes</h3><p>The core of this effort is <a href="https://k3s.io/">K3s</a>, a Kubernetes distribution that allows us to run on a single node without virtualization.</p> <p>But first off, <a href="https://docs.docker.com/engine/install/fedora/">install Docker</a>.</p> <p>Then install K3s:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>curl<span class="w"> </span>-sfL<span class="w"> </span>https://get.k3s.io<span class="w"> </span><span class="p">|</span><span class="w"> </span>sh<span class="w"> </span>- </pre></div> <p>It may prompt you to adjust some SELinux policies like so:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>sudo<span class="w"> </span>dnf<span class="w"> </span>install<span class="w"> </span>-y<span class="w"> </span>container-selinux<span class="w"> </span>selinux-policy-base $<span class="w"> </span>sudo<span class="w"> </span>rpm<span class="w"> </span>-i<span class="w"> </span>https://rpm.rancher.io/k3s-selinux-0.1.1-rc1.el7.noarch.rpm </pre></div> <p>Swap these out with whatever it prompts and retry the K3s install.</p> <p>Finally, <a href="https://kubernetes.io/docs/tasks/tools/install-kubectl/">install kubectl</a>:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>curl<span class="w"> </span>-LO<span class="w"> </span>https://storage.googleapis.com/kubernetes-release/release/<span class="sb">`</span>curl<span class="w"> </span>-s<span class="w"> </span>https://storage.googleapis.com/kubernetes-release/release/stable.txt<span class="sb">`</span>/bin/linux/amd64/kubectl </pre></div> <p>Now copy the global K3s kubeconfig into <code>~/.kube/config</code>:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>sudo<span class="w"> </span>cp<span class="w"> </span>/etc/rancher/k3s/k3s.yaml<span class="w"> </span>~/.kube/config $<span class="w"> </span>sudo<span class="w"> </span>chown<span class="w"> </span><span class="nv">$USER</span>:<span class="nv">$GROUP</span><span class="w"> </span>~/.kube/config </pre></div> <p>And enable K3s:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>sudo<span class="w"> </span>systemctl<span class="w"> </span><span class="nb">enable</span><span class="w"> </span>k3s </pre></div> <p>If you're on Fedora 31+ you'll need to disable cgroups v2 and reboot:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>sudo<span class="w"> </span>grubby<span class="w"> </span>--args<span class="o">=</span><span class="s2">&quot;systemd.unified_cgroup_hierarchy=0&quot;</span><span class="w"> </span>--update-kernel<span class="o">=</span>ALL $<span class="w"> </span>sudo<span class="w"> </span>reboot </pre></div> <p>Finally, you can run <code>kubectl</code>:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>kubectl<span class="w"> </span>get<span class="w"> </span>pods No<span class="w"> </span>resources<span class="w"> </span>found<span class="w"> </span><span class="k">in</span><span class="w"> </span>default<span class="w"> </span>namespace. </pre></div> <h3 id="a-simple-application">A simple application</h3><p>We'll create a small Flask app, containerize it, and write a Kubernetes deployment and service config for it.</p> <p>We begin with <code>app.py</code>:</p> <div class="highlight"><pre><span></span><span class="kn">from</span> <span class="nn">flask</span> <span class="kn">import</span> <span class="n">Flask</span> <span class="n">app</span> <span class="o">=</span> <span class="n">Flask</span><span class="p">(</span><span class="vm">__name__</span><span class="p">)</span> <span class="nd">@app</span><span class="o">.</span><span class="n">route</span><span class="p">(</span><span class="s1">&#39;/&#39;</span><span class="p">)</span> <span class="k">def</span> <span class="nf">index</span><span class="p">():</span> <span class="k">return</span> <span class="s1">&#39;Hello World, Flask!&#39;</span> <span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">&#39;__main__&#39;</span><span class="p">:</span> <span class="n">app</span><span class="o">.</span><span class="n">run</span><span class="p">(</span><span class="n">debug</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span> </pre></div> <p>Then a <code>Dockerfile</code>:</p> <div class="highlight"><pre><span></span><span class="k">FROM</span><span class="w"> </span><span class="s">python:3-slim</span> <span class="k">RUN</span><span class="w"> </span>pip<span class="w"> </span>install<span class="w"> </span>flask <span class="k">COPY</span><span class="w"> </span>.<span class="w"> </span>/app <span class="k">CMD</span><span class="w"> </span>python3<span class="w"> </span>/app/app.py </pre></div> <p>Then the deployment in <code>manifest.yaml</code>:</p> <div class="highlight"><pre><span></span><span class="nt">apiVersion</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">apps/v1</span> <span class="nt">kind</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Deployment</span> <span class="nt">metadata</span><span class="p">:</span> <span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">helloworld</span> <span class="nt">spec</span><span class="p">:</span> <span class="w"> </span><span class="nt">selector</span><span class="p">:</span> <span class="w"> </span><span class="nt">matchLabels</span><span class="p">:</span> <span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">helloworld</span> <span class="w"> </span><span class="nt">template</span><span class="p">:</span> <span class="w"> </span><span class="nt">metadata</span><span class="p">:</span> <span class="w"> </span><span class="nt">labels</span><span class="p">:</span> <span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">helloworld</span> <span class="w"> </span><span class="nt">spec</span><span class="p">:</span> <span class="w"> </span><span class="nt">containers</span><span class="p">:</span> <span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">image</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">helloworld</span> <span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">helloworld</span> </pre></div> <h3 id="running-in-kubernetes">Running in Kubernetes</h3><p>First we build, save, and import the image into <code>k3s</code>:</p> <div class="highlight"><pre><span></span><span class="err">$</span> <span class="n">docker</span> <span class="n">build</span> <span class="o">.</span> <span class="o">-</span><span class="n">t</span> <span class="n">helloworld</span> <span class="err">$</span> <span class="n">docker</span> <span class="n">save</span> <span class="n">helloworld</span> <span class="o">&gt;</span> <span class="n">helloworld</span><span class="o">.</span><span class="n">tar</span> <span class="err">$</span> <span class="n">sudo</span> <span class="n">k3s</span> <span class="n">ctr</span> <span class="n">image</span> <span class="kn">import</span> <span class="nn">helloworld.tar</span> <span class="err">$</span> <span class="n">kubectl</span> <span class="n">apply</span> <span class="o">-</span><span class="n">f</span> <span class="o">./</span><span class="n">manifest</span><span class="o">.</span><span class="n">yaml</span> <span class="err">$</span> <span class="n">kubectl</span> <span class="n">port</span><span class="o">-</span><span class="n">forward</span> <span class="err">$</span><span class="p">(</span><span class="n">kubectl</span> <span class="n">get</span> <span class="n">pods</span> <span class="o">|</span> <span class="n">grep</span> <span class="n">helloworld</span> <span class="o">|</span> <span class="n">cut</span> <span class="o">-</span><span class="n">d</span> <span class="s1">&#39; &#39;</span> <span class="o">-</span><span class="n">f</span> <span class="mi">1</span><span class="p">)</span> <span class="mi">5000</span> <span class="o">&gt;</span> <span class="n">log</span> <span class="mi">2</span><span class="o">&gt;&amp;</span><span class="mi">1</span> <span class="o">&amp;</span> <span class="err">$</span> <span class="n">curl</span> <span class="n">localhost</span><span class="p">:</span><span class="mi">5000</span> <span class="n">Hello</span> <span class="n">World</span><span class="p">,</span> <span class="n">Flask</span> </pre></div> <p>And that's it!</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Latest post is a recipe for creating a self-contained, single-node Kubernetes cluster for CI environments using a basic Flask app.<a href="https://t.co/fegAZFEQzO">https://t.co/fegAZFEQzO</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1287163839306444800?ref_src=twsrc%5Etfw">July 25, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/a-single-node-kubernetes-cluster-without-virtualization-or-a-container-registry.htmlSat, 25 Jul 2020 00:00:00 +0000Generating a full-stack application from a databasehttp://notes.eatonphil.com/generating-a-full-stack-application-from-a-database.html<p><a href="https://dbcore.org">DBCore</a> can now generate a TypeScript/React CRUD UI that is automatically hooked up to the generated REST API (in Go).</p> <p>The UI has full support for login, viewing (and filtering), editing, and creating database entities.</p> <p>PostgreSQL, SQLite and MySQL are supported.</p> <h3 id="how-to-use?">How to use?</h3><p>The goal of this project is primarily to provide as much useful boilerplate as possible for full-stack applications. The system is probably not sufficient to be an entire application development platform. It's currently missing hooks, overrides, and per-row/per-table authorization.</p> <p>The UI code generation may be even less useful in the long-term than the API because UIs are by necessity very diverse. But it is good not to need to build the same browser-side API, authentication, and routing logic again now that it's taken care of in code generation.</p> <h3 id="screenshots">Screenshots</h3><p>Here are a few screenshots of the examples/todo application. Every page here is auto-generated after reading the database schema. The browser application is hooked up to the similarly auto-generated API.</p> <div style="padding-bottom: 15px;"> <small>Sign in</small> <img style="border: 1px solid #ddd;" src="https://i.imgur.com/1ReEEdf.png"/> </div> <div style="padding-bottom: 15px;"> <small>Creating a table entity</small> <img style="border: 1px solid #ddd;" src="https://i.imgur.com/AiryzjX.png"/> </div> <div style="padding-bottom: 15px;"> <small>Viewing all table entities</small> <img style="border: 1px solid #ddd;" src="https://i.imgur.com/l9jI0LA.png"/> </div> <div style="padding-bottom: 15px;"> <small>Filtering table entities</small> <img style="border: 1px solid #ddd;" src="https://i.imgur.com/J21vQDE.png"/> </div> <div style="padding-bottom: 15px;"> <small>Viewing an individual table entity</small> <img style="border: 1px solid #ddd;" src="https://i.imgur.com/T2VhBFt.png"/> </div> <div> <small>Editing a table entity</small> <img style="border: 1px solid #ddd;" src="https://i.imgur.com/f2sRN1p.png"> </div><p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">What&#39;s new in DBCore: a TypeScript/React UI generated from your database schema and hooked up to the similarly generated Go REST API.<br><br>So you can now generate an entire full stack application from your database schema. Screenshots in the post.<a href="https://t.co/BTGRVBsfUR">https://t.co/BTGRVBsfUR</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1272295312900661250?ref_src=twsrc%5Etfw">June 14, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/generating-a-full-stack-application-from-a-database.htmlSun, 14 Jun 2020 00:00:00 +0000Generating a REST API from a databasehttp://notes.eatonphil.com/generating-a-rest-api-from-a-database.html<p>I recently published an <a href="https://eatonphil.github.io/dbcore/">alpha version of a code generation tool, DBCore,</a> that reads a database schema from PostgreSQL or MySQL and generates an entire Go API with CRUD operations, pagination, filtering, and authentication.</p> <p><img src="https://pbs.twimg.com/media/EZJ7TvNXQAEgraD?format=png&name=large" /></p> <p>But more than just generating code like <a href="https://github.com/xo/xo">xo/xo</a> or <a href="https://gnorm.org/">gnorm</a>, DBCore defines a standard REST API that can be implemented in any language -- and includes a reference implementation in Go. I'm eager to add Java and Ruby implementations as well. And I'd be more than happy to accept community contributions.</p> <h3 id="boilerplate-&amp;-code-generation">Boilerplate &amp; code generation</h3><p>Web application boilerplate is boring. You should do it once from scratch (preferably down to the socket layer) and never do it again. I struggled for the last few years to find the right system to reduce boilerplate. If I were building a new line-of-business application as an employee I'd pick one of Rails, ASP.NET, Spring, Django, or similar.</p> <p>I've never worked on one of those frameworks professionally and I've never been able to force myself to learn any of them in my free time. But even if I could use one of these, none of them get close to giving you an entire functioning application with authentication, pagination, filtering all based on your existing database.</p> <p>Over the last few years though I've relied heavily on code generation for Go projects. Code generation is basically the only way to conserve type-safe code in Go. But it's similarly <a href="https://www.jooq.org/doc/3.13/manual/code-generation/">popular</a> in more powerful languages like Java.</p> <p>However none of the existing projects give you much flexibility or provide you with enough templates to be useful.</p> <h3 id="dbcore">DBCore</h3><p>DBCore is written in F# and can be distributed as a static binary on all systems .NET now supports (read: not just Windows!).</p> <p>Reading from MySQL or PostgreSQL is supported but I'd like to see that extended to include SQLite, Oracle, and MS SQL at least.</p> <p>As mentioned, currently DBCore only provides a Go REST API template. That only solves half the problem of building an application though. And while there are some projects that can generate an admin CRUD interface for you, I want to see that more tightly integrated into DBCore. So I'll be introducing a new template for a browser application as well. For each table in the database it will generate a page showing paginated entries and allow you to create, update, and delete.</p> <p>Finally, while the tool only currently has a concept of "browser" and "api" templates, the project should be able to accept any kind of template and generate any text based on any database schema.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">New blog post, background and goals for dbcore<a href="https://t.co/XW9gUCtvr0">https://t.co/XW9gUCtvr0</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1269467766727327745?ref_src=twsrc%5Etfw">June 7, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/generating-a-rest-api-from-a-database.htmlSat, 06 Jun 2020 00:00:00 +0000RFCs and asynchronous-first culturehttp://notes.eatonphil.com/rfcs-and-asynchronous-first-culture.html<p>I hated writing documentation before working on features. But after a while I realized I couldn't communicate well enough, even with folks I had a good connection with. It took me a number of mistaken deliveries to get the message.</p> <h3 id="sketches-and-mockups">Sketches and mockups</h3><p>Designers solve this by producing low-fidelity sketches early on in the process, iterating on feedback to produce a high-fidelity mockup. I solve this by producing short RFC (request for comment) documents. This isn't an original idea, but I see it so rarely I wanted to share.</p> <p>Now as soon as I begin thinking about a technical or organizational change, I write an RFC. My RFCs are typically a page or two long and typically take me 30-60 minutes for a good first draft. I make clear in the title that it is a proposal or draft. This allows me to make crazy suggestions without upsetting folks; a draft can be easily thrown away.</p> <h3 id="rfc-process">RFC process</h3><p>My RFCs include three key sections:</p> <ol> <li>What I think the problem is</li> <li>Pros/cons of all the solutions I considered</li> <li>Which solution I'm planning to go with if no one responds to the RFC</li> </ol> <p>After I write the first draft I circulate it among a small group of peers I respect, my boss, etc. I request feedback at leisure and I check in every few days with a reminder. If no one responds after a while and there is little concern, I typically move forward with the proposed solution.</p> <p>In addition to clarifying intent up front, this removes the need to schedule a meeting to <em>discuss a problem</em>. Discussion and decisions can be held asynchronously. I only schedule a meeting if there is disagreement that is unable to be resolved in writing.</p> <p>After incorporating feedback, I either throw away the RFC and move on or feel reasonably confident about the proposal. I send it out to a wider group of relevant participants. Final meetings are held as needed.</p> <h3 id="the-other-option">The other option</h3><p>In contrast, synchronous-first and undocumented proposals make some sense when you've got a small team in the same timezone with a similar schedule. Otherwise, you repeatedly reschedule meetings to accommodate everyone. You spend your first few meetings simply coming to understand and agree on <em>the problem</em>.</p> <p>Spending 30-60 minutes to draft a proposal is almost always easier. It makes the decision-making process faster and produces more accurate results.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Spending 30-60 minutes to draft a technical (or organizational) proposal is almost always easier for discussion and action than just scheduling a meeting. Or &quot;my asynchronous-first manifesto&quot;<a href="https://t.co/gm4SUzBD2W">https://t.co/gm4SUzBD2W</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1261767623592869896?ref_src=twsrc%5Etfw">May 16, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/rfcs-and-asynchronous-first-culture.htmlSat, 16 May 2020 00:00:00 +0000Writing a SQL database from scratch in Go: 4. a database/sql driverhttp://notes.eatonphil.com/database-basics-a-database-sql-driver.html<p class="note"> Previously in database basics: <! forgive me, for I have sinned > <br /> <a href="/database-basics.html">1. SELECT, INSERT, CREATE and a REPL</a> <br /> <a href="/database-basics-expressions-and-where.html">2. binary expressions and WHERE filters</a> <br /> <a href="/database-basics-indexes.html">3. indexes</a> </p><p>In this post, we'll extend <a href="https://github.com/eatonphil/gosql">gosql</a> to implement the <code>database/sql</code> driver interface. This will allow us to interact with gosql the same way we would interact with any other database.</p> <p>Here is an example familiar program (stored in <code>cmd/sqlexample/main.go</code>) we'll be able to run:</p> <div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span> <span class="kn">import</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="s">&quot;database/sql&quot;</span> <span class="w"> </span><span class="s">&quot;fmt&quot;</span> <span class="w"> </span><span class="nx">_</span><span class="w"> </span><span class="s">&quot;github.com/eatonphil/gosql&quot;</span> <span class="p">)</span> <span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">db</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">sql</span><span class="p">.</span><span class="nx">Open</span><span class="p">(</span><span class="s">&quot;postgres&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">db</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span> <span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">db</span><span class="p">.</span><span class="nx">Query</span><span class="p">(</span><span class="s">&quot;CREATE TABLE users (name TEXT, age INT);&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">db</span><span class="p">.</span><span class="nx">Query</span><span class="p">(</span><span class="s">&quot;INSERT INTO users VALUES (&#39;Terry&#39;, 45);&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">db</span><span class="p">.</span><span class="nx">Query</span><span class="p">(</span><span class="s">&quot;INSERT INTO users VALUES (&#39;Anette&#39;, 57);&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">rows</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">db</span><span class="p">.</span><span class="nx">Query</span><span class="p">(</span><span class="s">&quot;SELECT name, age FROM users;&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">name</span><span class="w"> </span><span class="kt">string</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">age</span><span class="w"> </span><span class="kt">uint64</span> <span class="w"> </span><span class="k">defer</span><span class="w"> </span><span class="nx">rows</span><span class="p">.</span><span class="nx">Close</span><span class="p">()</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">rows</span><span class="p">.</span><span class="nx">Next</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">rows</span><span class="p">.</span><span class="nx">Scan</span><span class="p">(</span><span class="o">&amp;</span><span class="nx">name</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">age</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">&quot;Name: %s, Age: %d\n&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">name</span><span class="p">,</span><span class="w"> </span><span class="nx">age</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">rows</span><span class="p">.</span><span class="nx">Err</span><span class="p">();</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>Our gosql driver will use a single instance of the <code>Backend</code> for all connections.</p> <p>Aside from that, it is a simple matter of wrapping our existing APIs in structs that implement the <code>database/sql/driver.Driver</code> interface.</p> <p>This post is largely a discussion of <a href="https://github.com/eatonphil/gosql/commit/0d0aa61a74580a6aef11296741abfba4e1d4ae5c">this commit</a>.</p> <h3 id="implementing-the-driver">Implementing the driver</h3><p>A driver is registered by calling <code>sql.Register</code> with a driver instance.</p> <p>We'll add the registration code to an <code>init</code> function in a new file, <code>driver.go</code>:</p> <div class="highlight"><pre><span></span><span class="kd">struct</span><span class="w"> </span><span class="nx">Driver</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">bkd</span><span class="w"> </span><span class="nx">Backend</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">init</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">sql</span><span class="p">.</span><span class="nx">Register</span><span class="p">(</span><span class="s">&quot;postgres&quot;</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">Driver</span><span class="p">{</span><span class="nx">NewMemoryBackend</span><span class="p">()})</span> <span class="p">}</span> </pre></div> <p>According to the <a href="https://pkg.go.dev/database/sql/driver?tab=doc#Driver">Driver interface</a>, we need only implement <code>Open</code> to return an connection instance that implements the <code>database/sql/driver.Conn</code> interface.</p> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">Driver</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">bkd</span><span class="w"> </span><span class="nx">Backend</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">d</span><span class="w"> </span><span class="o">*</span><span class="nx">Driver</span><span class="p">)</span><span class="w"> </span><span class="nx">Open</span><span class="p">(</span><span class="nx">name</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="nx">driver</span><span class="p">.</span><span class="nx">Conn</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">Conn</span><span class="p">{</span><span class="nx">d</span><span class="p">.</span><span class="nx">bkd</span><span class="p">},</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">init</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">sql</span><span class="p">.</span><span class="nx">Register</span><span class="p">(</span><span class="s">&quot;postgres&quot;</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">Driver</span><span class="p">{</span><span class="nx">NewMemoryBackend</span><span class="p">()})</span> <span class="p">}</span> </pre></div> <h3 id="implementing-the-connection">Implementing the connection</h3><p>According to the <a href="https://pkg.go.dev/database/sql/driver?tab=doc#Conn">Conn interface</a>, we must implement:</p> <ul> <li><code>Prepare(query string) (driver.Stmt, error)</code> to handle prepared statements</li> <li><code>Close</code> to handle cleanup</li> <li>and <code>Begin</code> to start a transaction</li> </ul> <p>The connection can also optionally implement <code>Query</code> and <code>Exec</code>.</p> <p>To simplify things we'll panic on <code>Prepare</code> and on <code>Begin</code> (we don't have transactions yet). There's no cleanup required so we'll do nothing in <code>Close</code>.</p> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">Conn</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">bkd</span><span class="w"> </span><span class="nx">Backend</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">dc</span><span class="w"> </span><span class="o">*</span><span class="nx">Conn</span><span class="p">)</span><span class="w"> </span><span class="nx">Prepare</span><span class="p">(</span><span class="nx">query</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="nx">driver</span><span class="p">.</span><span class="nx">Stmt</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="s">&quot;Prepare not implemented&quot;</span><span class="p">)</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">dc</span><span class="w"> </span><span class="o">*</span><span class="nx">Conn</span><span class="p">)</span><span class="w"> </span><span class="nx">Begin</span><span class="p">()</span><span class="w"> </span><span class="p">(</span><span class="nx">driver</span><span class="p">.</span><span class="nx">Tx</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="s">&quot;Begin not implemented&quot;</span><span class="p">)</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">dc</span><span class="w"> </span><span class="o">*</span><span class="nx">Conn</span><span class="p">)</span><span class="w"> </span><span class="nx">Close</span><span class="p">()</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <p>The only method we actually need, <code>Query</code>, is not required by the interface. It takes a query string and array of query parameters, returning an instance implementing the <code>database/sql/driver.Rows</code> interface.</p> <p>To implement <code>Query</code>, we basically copy the logic we had in the <code>cmd/main.go</code> REPL. The only change is that when we return results when handling <code>SELECT</code>, we'll return a struct that implements the <code>database/sql/driver.Rows</code> interface.</p> <p class="note"> <code>database/sql/driver.Rows</code> is not the same type as <code>database/sql.Rows</code>, which may sound more familiar. <code>database/sql/driver.Rows</code> is a simpler, lower-level interface. </p><p>If we receive parameterized query arguments, we'll ignore them for now. And if the query involves multiple statements, we'll process only the first statement.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">dc</span><span class="w"> </span><span class="o">*</span><span class="nx">Conn</span><span class="p">)</span><span class="w"> </span><span class="nx">Query</span><span class="p">(</span><span class="nx">query</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="w"> </span><span class="p">[]</span><span class="nx">driver</span><span class="p">.</span><span class="nx">Value</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="nx">driver</span><span class="p">.</span><span class="nx">Rows</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">args</span><span class="p">)</span><span class="w"> </span><span class="p">&gt;</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// TODO: support parameterization</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="s">&quot;Parameterization not supported&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">parser</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">Parser</span><span class="p">{}</span> <span class="w"> </span><span class="nx">ast</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parser</span><span class="p">.</span><span class="nx">Parse</span><span class="p">(</span><span class="nx">query</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Error while parsing: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// NOTE: ignorning all but the first statement</span> <span class="w"> </span><span class="nx">stmt</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">ast</span><span class="p">.</span><span class="nx">Statements</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">stmt</span><span class="p">.</span><span class="nx">Kind</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">CreateIndexKind</span><span class="p">:</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">dc</span><span class="p">.</span><span class="nx">bkd</span><span class="p">.</span><span class="nx">CreateIndex</span><span class="p">(</span><span class="nx">stmt</span><span class="p">.</span><span class="nx">CreateIndexStatement</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Error adding index on table: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">CreateTableKind</span><span class="p">:</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">dc</span><span class="p">.</span><span class="nx">bkd</span><span class="p">.</span><span class="nx">CreateTable</span><span class="p">(</span><span class="nx">stmt</span><span class="p">.</span><span class="nx">CreateTableStatement</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Error creating table: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">DropTableKind</span><span class="p">:</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">dc</span><span class="p">.</span><span class="nx">bkd</span><span class="p">.</span><span class="nx">DropTable</span><span class="p">(</span><span class="nx">stmt</span><span class="p">.</span><span class="nx">DropTableStatement</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Error dropping table: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">InsertKind</span><span class="p">:</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">dc</span><span class="p">.</span><span class="nx">bkd</span><span class="p">.</span><span class="nx">Insert</span><span class="p">(</span><span class="nx">stmt</span><span class="p">.</span><span class="nx">InsertStatement</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Error inserting values: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">SelectKind</span><span class="p">:</span> <span class="w"> </span><span class="nx">results</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">dc</span><span class="p">.</span><span class="nx">bkd</span><span class="p">.</span><span class="nx">Select</span><span class="p">(</span><span class="nx">stmt</span><span class="p">.</span><span class="nx">SelectStatement</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">Rows</span><span class="p">{</span> <span class="w"> </span><span class="nx">rows</span><span class="p">:</span><span class="w"> </span><span class="nx">results</span><span class="p">.</span><span class="nx">Rows</span><span class="p">,</span> <span class="w"> </span><span class="nx">columns</span><span class="p">:</span><span class="w"> </span><span class="nx">results</span><span class="p">.</span><span class="nx">Columns</span><span class="p">,</span> <span class="w"> </span><span class="nx">index</span><span class="p">:</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <h3 id="implementing-results">Implementing results</h3><p>According to the <a href="https://pkg.go.dev/database/sql/driver?tab=doc#Rows">Rows interface</a> we must implement:</p> <ul> <li><code>Columns() []string</code> to return an array of columns names</li> <li><code>Next(dest []Value) error</code> to populate an row array with the next row's worth of cells</li> <li>and <code>Close() error</code></li> </ul> <p>Our <code>Rows</code> struct will contain the rows and colums as returned from <code>Backend</code>, and will also contain an <code>index</code> field we can use in <code>Next</code> to populate the next row of cells.</p> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">Rows</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">columns</span><span class="w"> </span><span class="p">[]</span><span class="nx">ResultColumn</span> <span class="w"> </span><span class="nx">index</span><span class="w"> </span><span class="kt">uint64</span> <span class="w"> </span><span class="nx">rows</span><span class="w"> </span><span class="p">[][]</span><span class="nx">Cell</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">Rows</span><span class="p">)</span><span class="w"> </span><span class="nx">Columns</span><span class="p">()</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="w"> </span><span class="p">{}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">Rows</span><span class="p">)</span><span class="w"> </span><span class="nx">Close</span><span class="p">()</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">Rows</span><span class="p">)</span><span class="w"> </span><span class="nx">Next</span><span class="p">(</span><span class="nx">dest</span><span class="w"> </span><span class="p">[]</span><span class="nx">driver</span><span class="p">.</span><span class="nx">Value</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{}</span> </pre></div> <p>For <code>Columns</code> we simply need to extract and return the column names from <code>ResultColumn</code>.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">Rows</span><span class="p">)</span><span class="w"> </span><span class="nx">Columns</span><span class="p">()</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">columns</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">{}</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">columns</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">columns</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">columns</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">Name</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">columns</span> <span class="p">}</span> </pre></div> <p>For <code>Next</code> we need to iterate over each cell in the current row and retrieve its Go value, storing it in <code>dest</code>. The <code>dest</code> argument is simply a fixed-length array of <code>interface{}</code>, so we'll need no manual conversion.</p> <p>Once we've reached the last row, the <code>Next</code> contract is to return an <code>io.EOF</code>.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">Rows</span><span class="p">)</span><span class="w"> </span><span class="nx">Next</span><span class="p">(</span><span class="nx">dest</span><span class="w"> </span><span class="p">[]</span><span class="nx">driver</span><span class="p">.</span><span class="nx">Value</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">index</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">r</span><span class="p">.</span><span class="nx">rows</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">io</span><span class="p">.</span><span class="nx">EOF</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">row</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">rows</span><span class="p">[</span><span class="nx">r</span><span class="p">.</span><span class="nx">index</span><span class="p">]</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">idx</span><span class="p">,</span><span class="w"> </span><span class="nx">cell</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">row</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">typ</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">columns</span><span class="p">[</span><span class="nx">idx</span><span class="p">].</span><span class="nx">Type</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">typ</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">IntType</span><span class="p">:</span> <span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">cell</span><span class="p">.</span><span class="nx">AsInt</span><span class="p">()</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">dest</span><span class="p">[</span><span class="nx">idx</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">i</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">dest</span><span class="p">[</span><span class="nx">idx</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="o">*</span><span class="nx">i</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">TextType</span><span class="p">:</span> <span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">cell</span><span class="p">.</span><span class="nx">AsText</span><span class="p">()</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">dest</span><span class="p">[</span><span class="nx">idx</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">s</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">dest</span><span class="p">[</span><span class="nx">idx</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="o">*</span><span class="nx">s</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">BoolType</span><span class="p">:</span> <span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">cell</span><span class="p">.</span><span class="nx">AsBool</span><span class="p">()</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">dest</span><span class="p">[</span><span class="nx">idx</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">b</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">dest</span><span class="p">[</span><span class="nx">idx</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">b</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">index</span><span class="o">++</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <p>Finally in <code>Close</code> we'll set <code>index</code> higher than the number of rows to force <code>Next</code> to only ever return <code>io.EOF</code>.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">Rows</span><span class="p">)</span><span class="w"> </span><span class="nx">Close</span><span class="p">()</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">index</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">uint64</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">r</span><span class="p">.</span><span class="nx">rows</span><span class="p">))</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <p>And that's all the changes needed to implement a <code>database/sql</code> driver! See <a href="https://github.com/eatonphil/gosql/commit/0d0aa61a74580a6aef11296741abfba4e1d4ae5c#diff-749da71b40f8ff06fc9e78ce917b0cce">here</a> for <code>driver.go</code> in full.</p> <h3 id="running-the-example">Running the example</h3><p>With the driver in place we can try out the example:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>build<span class="w"> </span>./cmd/sqlexample/main.go $<span class="w"> </span>./main Name:<span class="w"> </span>Terry,<span class="w"> </span>Age:<span class="w"> </span><span class="m">45</span> Name:<span class="w"> </span>Anette,<span class="w"> </span>Age:<span class="w"> </span><span class="m">57</span> </pre></div> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Next post in the database basics series, implementing a database/sql driver for more seamless interactions in Go.<a href="https://t.co/AUZfUByNGE">https://t.co/AUZfUByNGE</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1259594720315047942?ref_src=twsrc%5Etfw">May 10, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/database-basics-a-database-sql-driver.htmlSun, 10 May 2020 00:00:00 +0000Writing a SQL database from scratch in Go: 3. indexeshttp://notes.eatonphil.com/database-basics-indexes.html<p class="note"> Previously in database basics: <! forgive me, for I have sinned > <br /> <a href="/database-basics.html">1. SELECT, INSERT, CREATE and a REPL</a> <br /> <a href="/database-basics-expressions-and-where.html">2. binary expressions and WHERE filters</a> <br /> <br /> Next in database basics: <br /> <a href="/database-basics-a-database-sql-driver.html">4. a database/sql driver</a> </p><p>In this post, we extend <a href="https://github.com/eatonphil/gosql">gosql</a> to support indexes. We focus on the addition of <code>PRIMARY KEY</code> constraints on table creation and some easy optimizations during <code>SELECT</code> statements.</p> <div class="highlight"><pre><span></span><span class="err">$</span><span class="w"> </span><span class="k">go</span><span class="w"> </span><span class="n">run</span><span class="w"> </span><span class="n">cmd</span><span class="o">/</span><span class="n">main</span><span class="p">.</span><span class="k">go</span> <span class="n">Welcome</span><span class="w"> </span><span class="k">to</span><span class="w"> </span><span class="n">gosql</span><span class="p">.</span> <span class="o">#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="p">(</span><span class="n">id</span><span class="w"> </span><span class="nb">INT</span><span class="w"> </span><span class="k">PRIMARY</span><span class="w"> </span><span class="k">KEY</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="nb">TEXT</span><span class="p">,</span><span class="w"> </span><span class="n">age</span><span class="w"> </span><span class="nb">INT</span><span class="p">);</span> <span class="n">ok</span> <span class="o">#</span><span class="w"> </span><span class="err">\</span><span class="n">d</span><span class="w"> </span><span class="n">users</span> <span class="k">Table</span><span class="w"> </span><span class="ss">&quot;users&quot;</span> <span class="k">Column</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="k">Type</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="k">Nullable</span> <span class="c1">---------+---------+-----------</span> <span class="n">id</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nb">integer</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="k">not</span><span class="w"> </span><span class="k">null</span> <span class="n">name</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nb">text</span><span class="w"> </span><span class="o">|</span> <span class="n">age</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nb">integer</span><span class="w"> </span><span class="o">|</span> <span class="n">Indexes</span><span class="p">:</span> <span class="w"> </span><span class="ss">&quot;users_pkey&quot;</span><span class="w"> </span><span class="k">PRIMARY</span><span class="w"> </span><span class="k">KEY</span><span class="p">,</span><span class="w"> </span><span class="n">rbtree</span><span class="w"> </span><span class="p">(</span><span class="ss">&quot;id&quot;</span><span class="p">)</span> </pre></div> <p>This post will broadly be a discussion of <a href="https://github.com/eatonphil/gosql/commit/9608511d9888ce3842ec7d1bfa8f77499e8123b2">this commit</a>.</p> <h3 id="what-is-an-index?">What is an index?</h3><p>An index is a mapping of a value to a row in a table. The value is often a column, but it can be many kinds of expressions. Databases typically store indexes in tree structures that provide O(log(n)) lookup time. When <code>SELECT</code>ing and filtering on a column that is indexed, a database can greatly improve lookup time by filtering first on this index. Without an index, a database must do a linear scan for matching rows. Though sometimes if a condition is broad enough, even with an index, a database may still end up doing a linear scan.</p> <p>While it may make sense initially to map a value to a row using a hash table for constant lookup times, hash tables don't provide ordering. So this would prevent an index from being applicable on anything but equality checks. For example, <code>SELECT x FROM y WHERE x > 2</code> couldn't use a hash index on <code>x</code>.</p> <p>Indexes in many SQL databases default to a <a href="https://www.cs.cornell.edu/courses/cs3110/2012sp/recitations/rec25-B-trees/rec25.html">B-Tree</a>, which offers efficient ordering of elements. These indexes are thus not constant-time lookups even if filtering on a unique column for a single item. Some databases, <a href="https://www.postgresql.org/docs/current/indexes-types.html">like PostgreSQL</a>, allow you to use a hash-based index instead of a tree. Here the previously listed restrictions apply (i.e. only equality checks will use the index).</p> <h3 id="upgrading-gosql">Upgrading gosql</h3><p>We proceed as follows:</p> <ul> <li>Upgrade table creation to support specifying a primary key<ul> <li>Pick a tree data structure for the index, adding it to the table</li> </ul> </li> <li>Upgrade <code>INSERT</code>s to let any indexes on the table process the new row</li> <li>Upgrade <code>SELECT</code>s to make use of any indexes, if possible</li> </ul> <h3 id="upgrading-table-creation">Upgrading table creation</h3><p>To allow the specification of a single column as the primary key when creating a table, we have to first modify the lexer and parser.</p> <h4 id="lexing/parsing">Lexing/parsing</h4><p>Since we've covered this process a few times already suffice it so say we make the following key additions:</p> <ul> <li><a href="https://github.com/eatonphil/gosql/blob/9608511d9888ce3842ec7d1bfa8f77499e8123b2/lexer.go#L36">Add <code>PRIMARY KEY</code> as a new keyword token to the lexer</a></li> <li><a href="https://github.com/eatonphil/gosql/blob/9608511d9888ce3842ec7d1bfa8f77499e8123b2/parser.go#L425">Add a check for this token to the parsing of column definitions</a></li> <li><a href="https://github.com/eatonphil/gosql/blob/9608511d9888ce3842ec7d1bfa8f77499e8123b2/ast.go#L98">Modify the AST to store a boolean value whether a column is a primary key</a></li> </ul> <h4 id="in-memory-backend">In-memory backend</h4><p>Next we move on to handling a primary key during table creation.</p> <p>Since there are many existing papers and blogs on implementing tree data structures, we will import an open-source implementation. And while most databases use a B-Tree, the most important properties of the tree for our purposes are 1) efficient ordering and 2) optionally duplicate keys. We go with a Red-Black Tree, <a href="https://github.com/petar/GoLLRB">GoLLRB</a>.</p> <p>The full definition of an index now includes:</p> <ul> <li>A name</li> <li>An expression (at first we only support this being an identifier referring to a column)</li> <li>A unique flag</li> <li>A type name (it will just be <code>rbtree</code> for now)</li> <li>A primary key flag (so we know to apply null checks among other things)</li> <li>And the actual tree itself</li> </ul> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">index</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">name</span><span class="w"> </span><span class="kt">string</span> <span class="w"> </span><span class="nx">exp</span><span class="w"> </span><span class="nx">expression</span> <span class="w"> </span><span class="nx">unique</span><span class="w"> </span><span class="kt">bool</span> <span class="w"> </span><span class="nx">primaryKey</span><span class="w"> </span><span class="kt">bool</span> <span class="w"> </span><span class="nx">tree</span><span class="w"> </span><span class="o">*</span><span class="nx">llrb</span><span class="p">.</span><span class="nx">LLRB</span> <span class="w"> </span><span class="nx">typ</span><span class="w"> </span><span class="kt">string</span> <span class="p">}</span> </pre></div> <p>When we create a table, we add an index if one of the columns is a primary key. We call out to a new public method, <code>CreateIndex</code>, that will handle actually setting things up.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">mb</span><span class="w"> </span><span class="o">*</span><span class="nx">MemoryBackend</span><span class="p">)</span><span class="w"> </span><span class="nx">CreateTable</span><span class="p">(</span><span class="nx">crt</span><span class="w"> </span><span class="o">*</span><span class="nx">CreateTableStatement</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">mb</span><span class="p">.</span><span class="nx">tables</span><span class="p">[</span><span class="nx">crt</span><span class="p">.</span><span class="nx">name</span><span class="p">.</span><span class="nx">value</span><span class="p">];</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">ErrTableAlreadyExists</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">t</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">createTable</span><span class="p">()</span> <span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">name</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">crt</span><span class="p">.</span><span class="nx">name</span><span class="p">.</span><span class="nx">value</span> <span class="w"> </span><span class="nx">mb</span><span class="p">.</span><span class="nx">tables</span><span class="p">[</span><span class="nx">t</span><span class="p">.</span><span class="nx">name</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">t</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">crt</span><span class="p">.</span><span class="nx">cols</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">primaryKey</span><span class="w"> </span><span class="o">*</span><span class="nx">expression</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">col</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="o">*</span><span class="nx">crt</span><span class="p">.</span><span class="nx">cols</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">columns</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">columns</span><span class="p">,</span><span class="w"> </span><span class="nx">col</span><span class="p">.</span><span class="nx">name</span><span class="p">.</span><span class="nx">value</span><span class="p">)</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">dt</span><span class="w"> </span><span class="nx">ColumnType</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">col</span><span class="p">.</span><span class="nx">datatype</span><span class="p">.</span><span class="nx">value</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s">&quot;int&quot;</span><span class="p">:</span> <span class="w"> </span><span class="nx">dt</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">IntType</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s">&quot;text&quot;</span><span class="p">:</span> <span class="w"> </span><span class="nx">dt</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">TextType</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s">&quot;boolean&quot;</span><span class="p">:</span> <span class="w"> </span><span class="nx">dt</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">BoolType</span> <span class="w"> </span><span class="k">default</span><span class="p">:</span> <span class="w"> </span><span class="nb">delete</span><span class="p">(</span><span class="nx">mb</span><span class="p">.</span><span class="nx">tables</span><span class="p">,</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">name</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">ErrInvalidDatatype</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">col</span><span class="p">.</span><span class="nx">primaryKey</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">primaryKey</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">delete</span><span class="p">(</span><span class="nx">mb</span><span class="p">.</span><span class="nx">tables</span><span class="p">,</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">name</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">ErrPrimaryKeyAlreadyExists</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">primaryKey</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">expression</span><span class="p">{</span> <span class="w"> </span><span class="nx">literal</span><span class="p">:</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">col</span><span class="p">.</span><span class="nx">name</span><span class="p">,</span> <span class="w"> </span><span class="nx">kind</span><span class="p">:</span><span class="w"> </span><span class="nx">literalKind</span><span class="p">,</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">columnTypes</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">columnTypes</span><span class="p">,</span><span class="w"> </span><span class="nx">dt</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">primaryKey</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">mb</span><span class="p">.</span><span class="nx">CreateIndex</span><span class="p">(</span><span class="o">&amp;</span><span class="nx">CreateIndexStatement</span><span class="p">{</span> <span class="w"> </span><span class="nx">table</span><span class="p">:</span><span class="w"> </span><span class="nx">crt</span><span class="p">.</span><span class="nx">name</span><span class="p">,</span> <span class="w"> </span><span class="nx">name</span><span class="p">:</span><span class="w"> </span><span class="nx">token</span><span class="p">{</span><span class="nx">value</span><span class="p">:</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">name</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s">&quot;_pkey&quot;</span><span class="p">},</span> <span class="w"> </span><span class="nx">unique</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span> <span class="w"> </span><span class="nx">primaryKey</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="p">,</span> <span class="w"> </span><span class="nx">exp</span><span class="p">:</span><span class="w"> </span><span class="o">*</span><span class="nx">primaryKey</span><span class="p">,</span> <span class="w"> </span><span class="p">})</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">delete</span><span class="p">(</span><span class="nx">mb</span><span class="p">.</span><span class="nx">tables</span><span class="p">,</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">name</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <p>Implementing <code>CreateIndex</code> is just a matter of adding a new index to the table.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">mb</span><span class="w"> </span><span class="o">*</span><span class="nx">MemoryBackend</span><span class="p">)</span><span class="w"> </span><span class="nx">CreateIndex</span><span class="p">(</span><span class="nx">ci</span><span class="w"> </span><span class="o">*</span><span class="nx">CreateIndexStatement</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">table</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">mb</span><span class="p">.</span><span class="nx">tables</span><span class="p">[</span><span class="nx">ci</span><span class="p">.</span><span class="nx">table</span><span class="p">.</span><span class="nx">value</span><span class="p">]</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">ErrTableDoesNotExist</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">index</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">table</span><span class="p">.</span><span class="nx">indexes</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">index</span><span class="p">.</span><span class="nx">name</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">ci</span><span class="p">.</span><span class="nx">name</span><span class="p">.</span><span class="nx">value</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">ErrIndexAlreadyExists</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">index</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">index</span><span class="p">{</span> <span class="w"> </span><span class="nx">exp</span><span class="p">:</span><span class="w"> </span><span class="nx">ci</span><span class="p">.</span><span class="nx">exp</span><span class="p">,</span> <span class="w"> </span><span class="nx">unique</span><span class="p">:</span><span class="w"> </span><span class="nx">ci</span><span class="p">.</span><span class="nx">unique</span><span class="p">,</span> <span class="w"> </span><span class="nx">primaryKey</span><span class="p">:</span><span class="w"> </span><span class="nx">ci</span><span class="p">.</span><span class="nx">primaryKey</span><span class="p">,</span> <span class="w"> </span><span class="nx">name</span><span class="p">:</span><span class="w"> </span><span class="nx">ci</span><span class="p">.</span><span class="nx">name</span><span class="p">.</span><span class="nx">value</span><span class="p">,</span> <span class="w"> </span><span class="nx">tree</span><span class="p">:</span><span class="w"> </span><span class="nx">llrb</span><span class="p">.</span><span class="nx">New</span><span class="p">(),</span> <span class="w"> </span><span class="nx">typ</span><span class="p">:</span><span class="w"> </span><span class="s">&quot;rbtree&quot;</span><span class="p">,</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">table</span><span class="p">.</span><span class="nx">indexes</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">table</span><span class="p">.</span><span class="nx">indexes</span><span class="p">,</span><span class="w"> </span><span class="nx">index</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <p>And that's it for creation of tables and indexes! Table creation is also the last time we need to make changes to the gosql frontend. The rest of the changes simply wrap existing insertion and selection.</p> <h3 id="upgrading-insert">Upgrading INSERT</h3><p>When a row is inserted into a table, each index on that table needs to process the row so it can add value-to-row mappings to the index.</p> <p class="note"> In the project code, you'll notice logic in <code>CreateIndex</code> to also go back over all existing rows to add them to the new index. This post omits further discussing the case where an index is created after a table is created. After reading this post, that case should be easy to follow. </p><p>Adding a row to an index is a matter of evaluting the index expression against that row and storing the resulting value in the tree. Along with the value, we store the integer index of the row in the table.</p> <p>If the index is required to be unique, we first check that the value does not yet exist.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">i</span><span class="w"> </span><span class="o">*</span><span class="nx">index</span><span class="p">)</span><span class="w"> </span><span class="nx">addRow</span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="o">*</span><span class="nx">table</span><span class="p">,</span><span class="w"> </span><span class="nx">rowIndex</span><span class="w"> </span><span class="kt">uint</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">indexValue</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">evaluateCell</span><span class="p">(</span><span class="nx">rowIndex</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">.</span><span class="nx">exp</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">indexValue</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">ErrViolatesNotNullConstraint</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">i</span><span class="p">.</span><span class="nx">unique</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">i</span><span class="p">.</span><span class="nx">tree</span><span class="p">.</span><span class="nx">Has</span><span class="p">(</span><span class="nx">treeItem</span><span class="p">{</span><span class="nx">value</span><span class="p">:</span><span class="w"> </span><span class="nx">indexValue</span><span class="p">})</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">ErrViolatesUniqueConstraint</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">i</span><span class="p">.</span><span class="nx">tree</span><span class="p">.</span><span class="nx">InsertNoReplace</span><span class="p">(</span><span class="nx">treeItem</span><span class="p">{</span> <span class="w"> </span><span class="nx">value</span><span class="p">:</span><span class="w"> </span><span class="nx">indexValue</span><span class="p">,</span> <span class="w"> </span><span class="nx">index</span><span class="p">:</span><span class="w"> </span><span class="nx">rowIndex</span><span class="p">,</span> <span class="w"> </span><span class="p">})</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <p>And that's it for insertion!</p> <h3 id="upgrading-select">Upgrading SELECT</h3><p>Until now, the logic for selecting rows from a table is to pick the table and iterate over all rows. If the row does not match the <code>WHERE</code> filter, we pass the row.</p> <p>If the table has an index and we are using the index in a recognized pattern in the <code>WHERE</code> AST (more on that later), we can pre-filter the table based on the index before iterating over each row. We can do this for each index and for each time a recognized pattern shows up.</p> <p class="note"> This process is called query planning. We build a simplified version of what you may see in SQL databases, specifically focusing on index usage since we don't yet support <code>JOIN</code>s. For further reading, SQLite has an <a href="https://www.sqlite.org/queryplanner.html#_lookup_by_index">excellent document</a> on their query planner for index usage. </p><div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">mb</span><span class="w"> </span><span class="o">*</span><span class="nx">MemoryBackend</span><span class="p">)</span><span class="w"> </span><span class="nx">Select</span><span class="p">(</span><span class="nx">slct</span><span class="w"> </span><span class="o">*</span><span class="nx">SelectStatement</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">Results</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">t</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">createTable</span><span class="p">()</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">slct</span><span class="p">.</span><span class="nx">from</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="kt">bool</span> <span class="w"> </span><span class="nx">t</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">mb</span><span class="p">.</span><span class="nx">tables</span><span class="p">[</span><span class="nx">slct</span><span class="p">.</span><span class="nx">from</span><span class="p">.</span><span class="nx">value</span><span class="p">]</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">ErrTableDoesNotExist</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">slct</span><span class="p">.</span><span class="nx">item</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="o">*</span><span class="nx">slct</span><span class="p">.</span><span class="nx">item</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">Results</span><span class="p">{},</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">results</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[][]</span><span class="nx">Cell</span><span class="p">{}</span> <span class="w"> </span><span class="nx">columns</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="nx">ResultColumn</span><span class="p">{}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">slct</span><span class="p">.</span><span class="nx">from</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">t</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">createTable</span><span class="p">()</span> <span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">rows</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="p">[][]</span><span class="nx">memoryCell</span><span class="p">{{}}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">iAndE</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">getApplicableIndexes</span><span class="p">(</span><span class="nx">slct</span><span class="p">.</span><span class="nx">where</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">index</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">iAndE</span><span class="p">.</span><span class="nx">i</span> <span class="w"> </span><span class="nx">exp</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">iAndE</span><span class="p">.</span><span class="nx">e</span> <span class="w"> </span><span class="nx">t</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">index</span><span class="p">.</span><span class="nx">newTableFromSubset</span><span class="p">(</span><span class="nx">t</span><span class="p">,</span><span class="w"> </span><span class="nx">exp</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">rows</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">result</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="nx">Cell</span><span class="p">{}</span> <span class="w"> </span><span class="nx">isFirstRow</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">results</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">slct</span><span class="p">.</span><span class="nx">where</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">val</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">evaluateCell</span><span class="p">(</span><span class="nb">uint</span><span class="p">(</span><span class="nx">i</span><span class="p">),</span><span class="w"> </span><span class="o">*</span><span class="nx">slct</span><span class="p">.</span><span class="nx">where</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="o">*</span><span class="nx">val</span><span class="p">.</span><span class="nx">AsBool</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">continue</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">col</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">finalItems</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="nx">columnName</span><span class="p">,</span><span class="w"> </span><span class="nx">columnType</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">evaluateCell</span><span class="p">(</span><span class="nb">uint</span><span class="p">(</span><span class="nx">i</span><span class="p">),</span><span class="w"> </span><span class="o">*</span><span class="nx">col</span><span class="p">.</span><span class="nx">exp</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">isFirstRow</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">columns</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">columns</span><span class="p">,</span><span class="w"> </span><span class="nx">ResultColumn</span><span class="p">{</span> <span class="w"> </span><span class="nx">Type</span><span class="p">:</span><span class="w"> </span><span class="nx">columnType</span><span class="p">,</span> <span class="w"> </span><span class="nx">Name</span><span class="p">:</span><span class="w"> </span><span class="nx">columnName</span><span class="p">,</span> <span class="w"> </span><span class="p">})</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">result</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">result</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">results</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">results</span><span class="p">,</span><span class="w"> </span><span class="nx">result</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">Results</span><span class="p">{</span> <span class="w"> </span><span class="nx">Columns</span><span class="p">:</span><span class="w"> </span><span class="nx">columns</span><span class="p">,</span> <span class="w"> </span><span class="nx">Rows</span><span class="p">:</span><span class="w"> </span><span class="nx">results</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <p>It's very simple and easy to miss, here is the change called out:</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">iAndE</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">getApplicableIndexes</span><span class="p">(</span><span class="nx">slct</span><span class="p">.</span><span class="nx">where</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">index</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">iAndE</span><span class="p">.</span><span class="nx">i</span> <span class="w"> </span><span class="nx">exp</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">iAndE</span><span class="p">.</span><span class="nx">e</span> <span class="w"> </span><span class="nx">t</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">index</span><span class="p">.</span><span class="nx">newTableFromSubset</span><span class="p">(</span><span class="nx">t</span><span class="p">,</span><span class="w"> </span><span class="nx">exp</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> </pre></div> <h4 id="getapplicableindexes">getApplicableIndexes</h4><p>There are probably a few very simple patterns we could look for, but for now we look for boolean expressions joined by <code>AND</code> that contain an index expression.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="o">*</span><span class="nx">table</span><span class="p">)</span><span class="w"> </span><span class="nx">getApplicableIndexes</span><span class="p">(</span><span class="nx">where</span><span class="w"> </span><span class="o">*</span><span class="nx">expression</span><span class="p">)</span><span class="w"> </span><span class="p">[]</span><span class="nx">indexAndExpression</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">linearizeExpressions</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="nx">where</span><span class="w"> </span><span class="o">*</span><span class="nx">expression</span><span class="p">,</span><span class="w"> </span><span class="nx">exps</span><span class="w"> </span><span class="p">[]</span><span class="nx">expression</span><span class="p">)</span><span class="w"> </span><span class="p">[]</span><span class="nx">expression</span> <span class="w"> </span><span class="nx">linearizeExpressions</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="nx">where</span><span class="w"> </span><span class="o">*</span><span class="nx">expression</span><span class="p">,</span><span class="w"> </span><span class="nx">exps</span><span class="w"> </span><span class="p">[]</span><span class="nx">expression</span><span class="p">)</span><span class="w"> </span><span class="p">[]</span><span class="nx">expression</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">where</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">where</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">binaryKind</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">exps</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">where</span><span class="p">.</span><span class="nx">binary</span><span class="p">.</span><span class="nx">op</span><span class="p">.</span><span class="nx">value</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">orKeyword</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">exps</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">where</span><span class="p">.</span><span class="nx">binary</span><span class="p">.</span><span class="nx">op</span><span class="p">.</span><span class="nx">value</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">andKeyword</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">exps</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">linearizeExpressions</span><span class="p">(</span><span class="o">&amp;</span><span class="nx">where</span><span class="p">.</span><span class="nx">binary</span><span class="p">.</span><span class="nx">a</span><span class="p">,</span><span class="w"> </span><span class="nx">exps</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">linearizeExpressions</span><span class="p">(</span><span class="o">&amp;</span><span class="nx">where</span><span class="p">.</span><span class="nx">binary</span><span class="p">.</span><span class="nx">b</span><span class="p">,</span><span class="w"> </span><span class="nx">exps</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">exps</span><span class="p">,</span><span class="w"> </span><span class="o">*</span><span class="nx">where</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">exps</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">linearizeExpressions</span><span class="p">(</span><span class="nx">where</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="nx">expression</span><span class="p">{})</span> <span class="w"> </span><span class="nx">iAndE</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="nx">indexAndExpression</span><span class="p">{}</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">exp</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">exps</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">index</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">indexes</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">index</span><span class="p">.</span><span class="nx">applicableValue</span><span class="p">(</span><span class="nx">exp</span><span class="p">)</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">iAndE</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">iAndE</span><span class="p">,</span><span class="w"> </span><span class="nx">indexAndExpression</span><span class="p">{</span> <span class="w"> </span><span class="nx">i</span><span class="p">:</span><span class="w"> </span><span class="nx">index</span><span class="p">,</span> <span class="w"> </span><span class="nx">e</span><span class="p">:</span><span class="w"> </span><span class="nx">exp</span><span class="p">,</span> <span class="w"> </span><span class="p">})</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">iAndE</span> <span class="p">}</span> </pre></div> <p>More specifically though, within binary operations we only support matching on an index if the following three conditions are met:</p> <ul> <li>the operator is one of <code>=</code>, <code><></code>, <code>></code>, <code><</code>, <code>>=</code>, or <code><=</code></li> <li>one of the operands is an identifier literal that matches the index's <code>exp</code> value</li> <li>the other operand is a literal value</li> </ul> <p class="note"> This is a simpler, stricter matching of an index than PostgreSQL where you can index expressions more generally, not just identifer literals. </p><div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">i</span><span class="w"> </span><span class="o">*</span><span class="nx">index</span><span class="p">)</span><span class="w"> </span><span class="nx">applicableValue</span><span class="p">(</span><span class="nx">exp</span><span class="w"> </span><span class="nx">expression</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="nx">expression</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">exp</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">binaryKind</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">be</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">exp</span><span class="p">.</span><span class="nx">binary</span> <span class="w"> </span><span class="c1">// Find the column and the value in the boolean expression</span> <span class="w"> </span><span class="nx">columnExp</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">be</span><span class="p">.</span><span class="nx">a</span> <span class="w"> </span><span class="nx">valueExp</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">be</span><span class="p">.</span><span class="nx">b</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">columnExp</span><span class="p">.</span><span class="nx">generateCode</span><span class="p">()</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">i</span><span class="p">.</span><span class="nx">exp</span><span class="p">.</span><span class="nx">generateCode</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">columnExp</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">be</span><span class="p">.</span><span class="nx">b</span> <span class="w"> </span><span class="nx">valueExp</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">be</span><span class="p">.</span><span class="nx">a</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Neither side is applicable, return nil</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">columnExp</span><span class="p">.</span><span class="nx">generateCode</span><span class="p">()</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">i</span><span class="p">.</span><span class="nx">exp</span><span class="p">.</span><span class="nx">generateCode</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">supportedChecks</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="nx">symbol</span><span class="p">{</span><span class="nx">eqSymbol</span><span class="p">,</span><span class="w"> </span><span class="nx">neqSymbol</span><span class="p">,</span><span class="w"> </span><span class="nx">gtSymbol</span><span class="p">,</span><span class="w"> </span><span class="nx">gteSymbol</span><span class="p">,</span><span class="w"> </span><span class="nx">ltSymbol</span><span class="p">,</span><span class="w"> </span><span class="nx">lteSymbol</span><span class="p">}</span> <span class="w"> </span><span class="nx">supported</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">sym</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">supportedChecks</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">sym</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">be</span><span class="p">.</span><span class="nx">op</span><span class="p">.</span><span class="nx">value</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">supported</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">true</span> <span class="w"> </span><span class="k">break</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">supported</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">valueExp</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">literalKind</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="s">&quot;Only index checks on literals supported&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">valueExp</span> <span class="p">}</span> </pre></div> <p>And that's it for finding applicable indexes.</p> <h4 id="newtablefromsubset">newTableFromSubset</h4><p>The last remaining piece is to go from a boolean expression in a <code>WHERE</code> clause (where an index is applicable) to a subset of rows in a table.</p> <p>Since we are only working with patterns of the type <code>indexed-column OP literal-value</code>, we grab the literal using the previous <code>applicableValue</code> helper. Then we look up that literal value in the index and return a new table with every row in the index that meets the condition of the operator for the literal value.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">i</span><span class="w"> </span><span class="o">*</span><span class="nx">index</span><span class="p">)</span><span class="w"> </span><span class="nx">newTableFromSubset</span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="o">*</span><span class="nx">table</span><span class="p">,</span><span class="w"> </span><span class="nx">exp</span><span class="w"> </span><span class="nx">expression</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="nx">table</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">valueExp</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">i</span><span class="p">.</span><span class="nx">applicableValue</span><span class="p">(</span><span class="nx">exp</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">valueExp</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">t</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">createTable</span><span class="p">().</span><span class="nx">evaluateCell</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="o">*</span><span class="nx">valueExp</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">t</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">tiValue</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">treeItem</span><span class="p">{</span><span class="nx">value</span><span class="p">:</span><span class="w"> </span><span class="nx">value</span><span class="p">}</span> <span class="w"> </span><span class="nx">indexes</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="kt">uint</span><span class="p">{}</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">symbol</span><span class="p">(</span><span class="nx">exp</span><span class="p">.</span><span class="nx">binary</span><span class="p">.</span><span class="nx">op</span><span class="p">.</span><span class="nx">value</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">eqSymbol</span><span class="p">:</span> <span class="w"> </span><span class="nx">i</span><span class="p">.</span><span class="nx">tree</span><span class="p">.</span><span class="nx">AscendGreaterOrEqual</span><span class="p">(</span><span class="nx">tiValue</span><span class="p">,</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="nx">i</span><span class="w"> </span><span class="nx">llrb</span><span class="p">.</span><span class="nx">Item</span><span class="p">)</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">ti</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">i</span><span class="p">.(</span><span class="nx">treeItem</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">bytes</span><span class="p">.</span><span class="nx">Equal</span><span class="p">(</span><span class="nx">ti</span><span class="p">.</span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">indexes</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">indexes</span><span class="p">,</span><span class="w"> </span><span class="nx">ti</span><span class="p">.</span><span class="nx">index</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">true</span> <span class="w"> </span><span class="p">})</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">neqSymbol</span><span class="p">:</span> <span class="w"> </span><span class="nx">i</span><span class="p">.</span><span class="nx">tree</span><span class="p">.</span><span class="nx">AscendGreaterOrEqual</span><span class="p">(</span><span class="nx">llrb</span><span class="p">.</span><span class="nx">Inf</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">),</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="nx">i</span><span class="w"> </span><span class="nx">llrb</span><span class="p">.</span><span class="nx">Item</span><span class="p">)</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">ti</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">i</span><span class="p">.(</span><span class="nx">treeItem</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">bytes</span><span class="p">.</span><span class="nx">Equal</span><span class="p">(</span><span class="nx">ti</span><span class="p">.</span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">indexes</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">indexes</span><span class="p">,</span><span class="w"> </span><span class="nx">ti</span><span class="p">.</span><span class="nx">index</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">true</span> <span class="w"> </span><span class="p">})</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">ltSymbol</span><span class="p">:</span> <span class="w"> </span><span class="nx">i</span><span class="p">.</span><span class="nx">tree</span><span class="p">.</span><span class="nx">DescendLessOrEqual</span><span class="p">(</span><span class="nx">tiValue</span><span class="p">,</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="nx">i</span><span class="w"> </span><span class="nx">llrb</span><span class="p">.</span><span class="nx">Item</span><span class="p">)</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">ti</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">i</span><span class="p">.(</span><span class="nx">treeItem</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">bytes</span><span class="p">.</span><span class="nx">Compare</span><span class="p">(</span><span class="nx">ti</span><span class="p">.</span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">)</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">indexes</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">indexes</span><span class="p">,</span><span class="w"> </span><span class="nx">ti</span><span class="p">.</span><span class="nx">index</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">true</span> <span class="w"> </span><span class="p">})</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">lteSymbol</span><span class="p">:</span> <span class="w"> </span><span class="nx">i</span><span class="p">.</span><span class="nx">tree</span><span class="p">.</span><span class="nx">DescendLessOrEqual</span><span class="p">(</span><span class="nx">tiValue</span><span class="p">,</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="nx">i</span><span class="w"> </span><span class="nx">llrb</span><span class="p">.</span><span class="nx">Item</span><span class="p">)</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">ti</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">i</span><span class="p">.(</span><span class="nx">treeItem</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">bytes</span><span class="p">.</span><span class="nx">Compare</span><span class="p">(</span><span class="nx">ti</span><span class="p">.</span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">)</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">indexes</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">indexes</span><span class="p">,</span><span class="w"> </span><span class="nx">ti</span><span class="p">.</span><span class="nx">index</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">true</span> <span class="w"> </span><span class="p">})</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">gtSymbol</span><span class="p">:</span> <span class="w"> </span><span class="nx">i</span><span class="p">.</span><span class="nx">tree</span><span class="p">.</span><span class="nx">AscendGreaterOrEqual</span><span class="p">(</span><span class="nx">tiValue</span><span class="p">,</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="nx">i</span><span class="w"> </span><span class="nx">llrb</span><span class="p">.</span><span class="nx">Item</span><span class="p">)</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">ti</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">i</span><span class="p">.(</span><span class="nx">treeItem</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">bytes</span><span class="p">.</span><span class="nx">Compare</span><span class="p">(</span><span class="nx">ti</span><span class="p">.</span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">)</span><span class="w"> </span><span class="p">&gt;</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">indexes</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">indexes</span><span class="p">,</span><span class="w"> </span><span class="nx">ti</span><span class="p">.</span><span class="nx">index</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">true</span> <span class="w"> </span><span class="p">})</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">gteSymbol</span><span class="p">:</span> <span class="w"> </span><span class="nx">i</span><span class="p">.</span><span class="nx">tree</span><span class="p">.</span><span class="nx">AscendGreaterOrEqual</span><span class="p">(</span><span class="nx">tiValue</span><span class="p">,</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="nx">i</span><span class="w"> </span><span class="nx">llrb</span><span class="p">.</span><span class="nx">Item</span><span class="p">)</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">ti</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">i</span><span class="p">.(</span><span class="nx">treeItem</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">bytes</span><span class="p">.</span><span class="nx">Compare</span><span class="p">(</span><span class="nx">ti</span><span class="p">.</span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">)</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">indexes</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">indexes</span><span class="p">,</span><span class="w"> </span><span class="nx">ti</span><span class="p">.</span><span class="nx">index</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">true</span> <span class="w"> </span><span class="p">})</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">newT</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">createTable</span><span class="p">()</span> <span class="w"> </span><span class="nx">newT</span><span class="p">.</span><span class="nx">columns</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">columns</span> <span class="w"> </span><span class="nx">newT</span><span class="p">.</span><span class="nx">columnTypes</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">columnTypes</span> <span class="w"> </span><span class="nx">newT</span><span class="p">.</span><span class="nx">indexes</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">indexes</span> <span class="w"> </span><span class="nx">newT</span><span class="p">.</span><span class="nx">rows</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="p">[][]</span><span class="nx">memoryCell</span><span class="p">{}</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">index</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">indexes</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">newT</span><span class="p">.</span><span class="nx">rows</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">newT</span><span class="p">.</span><span class="nx">rows</span><span class="p">,</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">rows</span><span class="p">[</span><span class="nx">index</span><span class="p">])</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">newT</span> <span class="p">}</span> </pre></div> <p>As you can see, an index may not necessarily improve on a linear search in some conditions. Imagine a table of 1 million rows indexed on an autoincrementing column. Imagine filtering on <code>col > 10</code>. The index may be able to eliminate 10 items but still return a pre-filtered table of around 1 million rows that must be passed through the <code>WHERE</code> filter.</p> <p>Additionally since we process each boolean expression one at a time, we can't take advantage of knowledge that might seem obvious to a human for two boolean expressions that together bound a range. For example in <code>x > 10 AND x < 20</code> we can see that only integers from 11 to 19 are applicable. But the current logic would go through each expression separately and find all rows that match either before the final linear search through all pre-filtered rows would eliminate the bulk.</p> <p class="note"> Thankfully real databases have decades of optimizations. But even then it can be difficult to know what index usages are being optimized without reading documentation, benchmarking, using <code>EXPLAIN ANALYSE</code>, or reading the source. </p><p>But that's it for changes needed to support basic indexes end-to-end!</p> <h3 id="trialing-an-index">Trialing an index</h3><p>Since the addition of indexes is so seamless, it is difficult to tell without trial that the index is effective. So we write a simple program that inserts N rows with and without an index. Finally it will query for the first and last items inserted. We show time and memory used during both insertion and selection.</p> <div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span> <span class="kn">import</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="s">&quot;fmt&quot;</span> <span class="w"> </span><span class="s">&quot;os&quot;</span> <span class="w"> </span><span class="s">&quot;runtime&quot;</span> <span class="w"> </span><span class="s">&quot;strconv&quot;</span> <span class="w"> </span><span class="s">&quot;time&quot;</span> <span class="w"> </span><span class="s">&quot;github.com/eatonphil/gosql&quot;</span> <span class="p">)</span> <span class="kd">var</span><span class="w"> </span><span class="nx">inserts</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">0</span> <span class="kd">var</span><span class="w"> </span><span class="nx">lastId</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">0</span> <span class="kd">var</span><span class="w"> </span><span class="nx">firstId</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">0</span> <span class="kd">func</span><span class="w"> </span><span class="nx">doInsert</span><span class="p">(</span><span class="nx">mb</span><span class="w"> </span><span class="nx">gosql</span><span class="p">.</span><span class="nx">Backend</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">parser</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">gosql</span><span class="p">.</span><span class="nx">Parser</span><span class="p">{}</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nx">inserts</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">lastId</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">i</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">firstId</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">lastId</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">ast</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parser</span><span class="p">.</span><span class="nx">Parse</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">&quot;INSERT INTO users VALUES (%d)&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">lastId</span><span class="p">))</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">mb</span><span class="p">.</span><span class="nx">Insert</span><span class="p">(</span><span class="nx">ast</span><span class="p">.</span><span class="nx">Statements</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="nx">InsertStatement</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">doSelect</span><span class="p">(</span><span class="nx">mb</span><span class="w"> </span><span class="nx">gosql</span><span class="p">.</span><span class="nx">Backend</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">parser</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">gosql</span><span class="p">.</span><span class="nx">Parser</span><span class="p">{}</span> <span class="w"> </span><span class="nx">ast</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parser</span><span class="p">.</span><span class="nx">Parse</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">&quot;SELECT id FROM users WHERE id = %d&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">lastId</span><span class="p">))</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">mb</span><span class="p">.</span><span class="nx">Select</span><span class="p">(</span><span class="nx">ast</span><span class="p">.</span><span class="nx">Statements</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="nx">SelectStatement</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">r</span><span class="p">.</span><span class="nx">Rows</span><span class="p">)</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="s">&quot;Expected 1 row&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">int</span><span class="p">(</span><span class="o">*</span><span class="nx">r</span><span class="p">.</span><span class="nx">Rows</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="mi">1</span><span class="p">].</span><span class="nx">AsInt</span><span class="p">())</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">inserts</span><span class="o">-</span><span class="mi">1</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">&quot;Bad row, got: %d&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">Rows</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="mi">1</span><span class="p">].</span><span class="nx">AsInt</span><span class="p">()))</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">ast</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">parser</span><span class="p">.</span><span class="nx">Parse</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">&quot;SELECT id FROM users WHERE id = %d&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">firstId</span><span class="p">))</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">mb</span><span class="p">.</span><span class="nx">Select</span><span class="p">(</span><span class="nx">ast</span><span class="p">.</span><span class="nx">Statements</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="nx">SelectStatement</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">r</span><span class="p">.</span><span class="nx">Rows</span><span class="p">)</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="s">&quot;Expected 1 row&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">int</span><span class="p">(</span><span class="o">*</span><span class="nx">r</span><span class="p">.</span><span class="nx">Rows</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="mi">1</span><span class="p">].</span><span class="nx">AsInt</span><span class="p">())</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">&quot;Bad row, got: %d&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">Rows</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="mi">1</span><span class="p">].</span><span class="nx">AsInt</span><span class="p">()))</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">perf</span><span class="p">(</span><span class="nx">name</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="nx">gosql</span><span class="p">.</span><span class="nx">Backend</span><span class="p">,</span><span class="w"> </span><span class="nx">cb</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="nx">b</span><span class="w"> </span><span class="nx">gosql</span><span class="p">.</span><span class="nx">Backend</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">start</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Now</span><span class="p">()</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="s">&quot;Starting&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">name</span><span class="p">)</span> <span class="w"> </span><span class="nx">cb</span><span class="p">(</span><span class="nx">b</span><span class="p">)</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">&quot;Finished %s: %f seconds\n&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">name</span><span class="p">,</span><span class="w"> </span><span class="nx">time</span><span class="p">.</span><span class="nx">Since</span><span class="p">(</span><span class="nx">start</span><span class="p">).</span><span class="nx">Seconds</span><span class="p">())</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">m</span><span class="w"> </span><span class="nx">runtime</span><span class="p">.</span><span class="nx">MemStats</span> <span class="w"> </span><span class="nx">runtime</span><span class="p">.</span><span class="nx">ReadMemStats</span><span class="p">(</span><span class="o">&amp;</span><span class="nx">m</span><span class="p">)</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">&quot;Alloc = %d MiB\n\n&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">m</span><span class="p">.</span><span class="nx">Alloc</span><span class="o">/</span><span class="mi">1024</span><span class="o">/</span><span class="mi">1024</span><span class="p">)</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">mb</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">gosql</span><span class="p">.</span><span class="nx">NewMemoryBackend</span><span class="p">()</span> <span class="w"> </span><span class="nx">index</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">arg</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">arg</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;--with-index&quot;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">index</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">true</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">arg</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;--inserts&quot;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">inserts</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">strconv</span><span class="p">.</span><span class="nx">Atoi</span><span class="p">(</span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="nx">i</span><span class="o">+</span><span class="mi">1</span><span class="p">])</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">primaryKey</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="s">&quot;&quot;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">index</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">primaryKey</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">&quot; PRIMARY KEY&quot;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">parser</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">gosql</span><span class="p">.</span><span class="nx">Parser</span><span class="p">{}</span> <span class="w"> </span><span class="nx">ast</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parser</span><span class="p">.</span><span class="nx">Parse</span><span class="p">(</span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">&quot;CREATE TABLE users (id INT%s)&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">primaryKey</span><span class="p">))</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">mb</span><span class="p">.</span><span class="nx">CreateTable</span><span class="p">(</span><span class="nx">ast</span><span class="p">.</span><span class="nx">Statements</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="nx">CreateTableStatement</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">indexingString</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="s">&quot; with indexing enabled&quot;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">index</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">indexingString</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">&quot;&quot;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">&quot;Inserting %d rows%s\n&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">inserts</span><span class="p">,</span><span class="w"> </span><span class="nx">indexingString</span><span class="p">)</span> <span class="w"> </span><span class="nx">perf</span><span class="p">(</span><span class="s">&quot;INSERT&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">mb</span><span class="p">,</span><span class="w"> </span><span class="nx">doInsert</span><span class="p">)</span> <span class="w"> </span><span class="nx">perf</span><span class="p">(</span><span class="s">&quot;SELECT&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">mb</span><span class="p">,</span><span class="w"> </span><span class="nx">doSelect</span><span class="p">)</span> <span class="p">}</span> </pre></div> <p>Build and run once without an index:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>build<span class="w"> </span>cmd/indextest/main.go ./main<span class="w"> </span>--inserts<span class="w"> </span><span class="m">1000000</span> Inserting<span class="w"> </span><span class="m">1000000</span><span class="w"> </span>rows Starting<span class="w"> </span>INSERT Finished<span class="w"> </span>INSERT:<span class="w"> </span><span class="m">76</span>.175133<span class="w"> </span>seconds <span class="nv">Alloc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">239</span><span class="w"> </span>MiB Starting<span class="w"> </span>SELECT Finished<span class="w"> </span>SELECT:<span class="w"> </span><span class="m">1</span>.301556<span class="w"> </span>seconds <span class="nv">Alloc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">238</span><span class="w"> </span>MiB </pre></div> <p>And run again with an index:</p> <div class="highlight"><pre><span></span>./main<span class="w"> </span>--inserts<span class="w"> </span><span class="m">1000000</span><span class="w"> </span>--with-index Inserting<span class="w"> </span><span class="m">1000000</span><span class="w"> </span>rows<span class="w"> </span>with<span class="w"> </span>indexing<span class="w"> </span>enabled Starting<span class="w"> </span>INSERT Finished<span class="w"> </span>INSERT:<span class="w"> </span><span class="m">89</span>.108121<span class="w"> </span>seconds <span class="nv">Alloc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">341</span><span class="w"> </span>MiB Starting<span class="w"> </span>SELECT Finished<span class="w"> </span>SELECT:<span class="w"> </span><span class="m">0</span>.000137<span class="w"> </span>seconds <span class="nv">Alloc</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">341</span><span class="w"> </span>MiB </pre></div> <p>The basic tradeoff that you can see is that for more memory and longer insertion times, you get a significantly faster lookup.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Very excited to share the latest database basics post on implementing indexes in gosql.<a href="https://t.co/QHfjCe1XsC">https://t.co/QHfjCe1XsC</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1256209468133650433?ref_src=twsrc%5Etfw">May 1, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/database-basics-indexes.htmlFri, 01 May 2020 00:00:00 +0000Writing a SQL database from scratch in Go: 2. binary expressions and WHERE filtershttp://notes.eatonphil.com/database-basics-expressions-and-where.html<p class="note"> Previously in database basics: <! forgive me, for I have sinned > <br /> <a href="/database-basics.html">1. SELECT, INSERT, CREATE and a REPL</a> <br /> <br /> Next in database basics: <br /> <a href="/database-basics-indexes.html">3. indexes</a> <br /> <a href="/database-basics-a-database-sql-driver.html">4. a database/sql driver</a> </p><p>In this post, we'll extend <a href="https://github.com/eatonphil/gosql">gosql</a> to support binary expressions and very simple filtering on SELECT results via WHERE. We'll introduce a general mechanism for interpreting an expression on a row in a table. The expression may be an identifier (where the result is the value of the cell corresponding to that column in the row), a numeric literal, a combination via a binary expression, etc.</p> <p>The following interactions will be possible:</p> <div class="highlight"><pre><span></span><span class="o">#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="p">(</span><span class="n">name</span><span class="w"> </span><span class="nb">TEXT</span><span class="p">,</span><span class="w"> </span><span class="n">age</span><span class="w"> </span><span class="nb">INT</span><span class="p">);</span> <span class="n">ok</span> <span class="o">#</span><span class="w"> </span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="s1">&#39;Stephen&#39;</span><span class="p">,</span><span class="w"> </span><span class="mi">16</span><span class="p">);</span> <span class="n">ok</span> <span class="o">#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="n">age</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">users</span><span class="p">;</span> <span class="n">name</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">age</span> <span class="c1">----------+------</span> <span class="n">Stephen</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">16</span> <span class="p">(</span><span class="mi">1</span><span class="w"> </span><span class="k">result</span><span class="p">)</span> <span class="n">ok</span> <span class="o">#</span><span class="w"> </span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="s1">&#39;Adrienne&#39;</span><span class="p">,</span><span class="w"> </span><span class="mi">23</span><span class="p">);</span> <span class="n">ok</span> <span class="o">#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">age</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="n">age</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">23</span><span class="p">;</span> <span class="n">age</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">name</span> <span class="c1">------+-----------</span> <span class="mi">25</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Adrienne</span> <span class="p">(</span><span class="mi">1</span><span class="w"> </span><span class="k">result</span><span class="p">)</span> <span class="n">ok</span> <span class="o">#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">users</span><span class="p">;</span> <span class="n">name</span> <span class="c1">------------</span> <span class="n">Stephen</span> <span class="n">Adrienne</span> <span class="p">(</span><span class="mi">2</span><span class="w"> </span><span class="n">results</span><span class="p">)</span> <span class="n">ok</span> </pre></div> <p>The changes we'll make in this post are roughly a walk through of <a href="https://github.com/eatonphil/gosql/commit/bd6a5d0d4a7410699b0d01beaabf91923df34b28">this commit</a>.</p> <h3 id="boilerplate-updates">Boilerplate updates</h3><p>There are a few updates to pick up that I won't go into in this post. Grab the following files from the main repo:</p> <ul> <li><a href="https://github.com/eatonphil/gosql/blob/master/lexer.go">lexer.go</a><ul> <li>The big change here is to use the same keyword matching algorithm for symbols. This allows us to support symbols that are longer than one character.</li> <li>This file also now includes the following keywords and symbols: <code>and</code>, <code>or</code>, <code>true</code>, <code>false</code>, <code>=</code>, <code><></code>, <code>||</code>, and <code>+</code>.</li> </ul> </li> <li><a href="https://github.com/eatonphil/gosql/blob/master/cmd/main.go">cmd/main.go</a><ul> <li>This file now uses a <a href="https://github.com/olekukonko/tablewriter">third-party table-rendering library</a> instead of the hacky, handwritten original one.</li> <li>This also uses a <a href="https://github.com/chzyer/readline">third-party readline implementation</a> so you get history and useful cursor movement in the REPL.</li> </ul> </li> </ul> <h4 id="parsing-boilerplate">Parsing boilerplate</h4><p>We'll redefine three helper functions in <code>parser.go</code> before going further: <code>parseToken</code>, <code>parseTokenKind</code>, and <code>helpMessage</code>.</p> <p>The <code>parseToken</code> helper will consume a token if it matches the one provided as an argument (ignoring location).</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">parseToken</span><span class="p">(</span><span class="nx">tokens</span><span class="w"> </span><span class="p">[]</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="nx">t</span><span class="w"> </span><span class="nx">token</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">initialCursor</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">tokens</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">p</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">tokens</span><span class="p">[</span><span class="nx">cursor</span><span class="p">];</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">equals</span><span class="p">(</span><span class="nx">p</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">p</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="p">}</span> </pre></div> <p>The <code>parseTokenKind</code> helper will consume a token if it is the same kind as an argument provided.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">parseTokenKind</span><span class="p">(</span><span class="nx">tokens</span><span class="w"> </span><span class="p">[]</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="nx">kind</span><span class="w"> </span><span class="nx">tokenKind</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">initialCursor</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">tokens</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">current</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">tokens</span><span class="p">[</span><span class="nx">cursor</span><span class="p">]</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">current</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">kind</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">current</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="p">}</span> </pre></div> <p>And the <code>helpMessage</code> helper will give an indication of where in a program something happened.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="w"> </span><span class="p">[]</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="nx">msg</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">token</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">cursor</span><span class="o">+</span><span class="mi">1</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">tokens</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">tokens</span><span class="p">[</span><span class="nx">cursor</span><span class="o">+</span><span class="mi">1</span><span class="p">]</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">tokens</span><span class="p">[</span><span class="nx">cursor</span><span class="p">]</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">&quot;[%d,%d]: %s, near: %s\n&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">loc</span><span class="p">.</span><span class="nx">line</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">loc</span><span class="p">.</span><span class="nx">col</span><span class="p">,</span><span class="w"> </span><span class="nx">msg</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">value</span><span class="p">)</span> <span class="p">}</span> </pre></div> <h3 id="parsing-binary-expressions">Parsing binary expressions</h3><p>Next we'll extend the AST structure in <code>ast.go</code> to support a "binary kind" of expression. The binary expression will have two sub-expressions and an operator.</p> <div class="highlight"><pre><span></span><span class="kd">const</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="nx">literalKind</span><span class="w"> </span><span class="nx">expressionKind</span> <span class="w"> </span><span class="nx">binaryKind</span> <span class="p">)</span> <span class="kd">type</span><span class="w"> </span><span class="nx">binaryExpression</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">a</span><span class="w"> </span><span class="nx">expression</span> <span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="nx">expression</span> <span class="w"> </span><span class="nx">op</span><span class="w"> </span><span class="nx">token</span> <span class="p">}</span> <span class="kd">type</span><span class="w"> </span><span class="nx">expression</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">literal</span><span class="w"> </span><span class="o">*</span><span class="nx">token</span> <span class="w"> </span><span class="nx">binary</span><span class="w"> </span><span class="o">*</span><span class="nx">binaryExpression</span> <span class="w"> </span><span class="nx">kind</span><span class="w"> </span><span class="nx">expressionKind</span> <span class="p">}</span> </pre></div> <p>We'll use Pratt parsing to handle operator precedence. There is an excellent introduction to this technique <a href="https://matklad.github.io/2020/04/13/simple-but-powerful-pratt-parsing.html">here</a>.</p> <p>If at the beginning of parsing we see a left parenthesis, we'll consume it and parse an expression within it. Then we'll look for a right parenthesis. Otherwise we'll look for a non-binary expression first (e.g. symbol, number).</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">parseExpression</span><span class="p">(</span><span class="nx">tokens</span><span class="w"> </span><span class="p">[]</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="nx">delimiters</span><span class="w"> </span><span class="p">[]</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">minBp</span><span class="w"> </span><span class="kt">uint</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">expression</span><span class="p">,</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">initialCursor</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">exp</span><span class="w"> </span><span class="o">*</span><span class="nx">expression</span> <span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parseToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">leftParenSymbol</span><span class="p">))</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">newCursor</span> <span class="w"> </span><span class="nx">rightParenToken</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">rightParenSymbol</span><span class="p">)</span> <span class="w"> </span><span class="nx">exp</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">parseExpression</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">delimiters</span><span class="p">,</span><span class="w"> </span><span class="nx">rightParenToken</span><span class="p">),</span><span class="w"> </span><span class="nx">minBp</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected expression after opening paren&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">parseToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">rightParenToken</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected closing paren&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">exp</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">parseLiteralExpression</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="o">...</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">exp</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span> <span class="p">}</span> </pre></div> <p>Then we'll look for a binary operator (e.g. <code>=</code>, <code>and</code>) or delimiter. If we find an operator and it of lesser "binding power" than the current minimum (<code>minBp</code> passed as an argument to the parse function with a default value of <code>0</code>), we'll return the current expression.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="o">...</span> <span class="w"> </span><span class="nx">lastCursor</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">cursor</span> <span class="nx">outer</span><span class="p">:</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">tokens</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">d</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">delimiters</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">parseToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">d</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">break</span><span class="w"> </span><span class="nx">outer</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">binOps</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="nx">token</span><span class="p">{</span> <span class="w"> </span><span class="nx">tokenFromKeyword</span><span class="p">(</span><span class="nx">andKeyword</span><span class="p">),</span> <span class="w"> </span><span class="nx">tokenFromKeyword</span><span class="p">(</span><span class="nx">orKeyword</span><span class="p">),</span> <span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">eqSymbol</span><span class="p">),</span> <span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">neqSymbol</span><span class="p">),</span> <span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">concatSymbol</span><span class="p">),</span> <span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">plusSymbol</span><span class="p">),</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">op</span><span class="w"> </span><span class="o">*</span><span class="nx">token</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">bo</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">binOps</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">t</span><span class="w"> </span><span class="o">*</span><span class="nx">token</span> <span class="w"> </span><span class="nx">t</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">parseToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">bo</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">op</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">t</span> <span class="w"> </span><span class="k">break</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">op</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected binary operator&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">bp</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">op</span><span class="p">.</span><span class="nx">bindingPower</span><span class="p">()</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">bp</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nx">minBp</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">lastCursor</span> <span class="w"> </span><span class="k">break</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="o">...</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">exp</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span> </pre></div> <p>The <code>bindingPower</code> function on tokens can be defined for now such that sum and concatenation have the highest binding power, followed by equality operations, then boolean operators, and then everything else at zero.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="nx">token</span><span class="p">)</span><span class="w"> </span><span class="nx">bindingPower</span><span class="p">()</span><span class="w"> </span><span class="kt">uint</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">keywordKind</span><span class="p">:</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">keyword</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">value</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">andKeyword</span><span class="p">:</span> <span class="w"> </span><span class="k">fallthrough</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">orKeyword</span><span class="p">:</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">1</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">symbolKind</span><span class="p">:</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">symbol</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">value</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">eqSymbol</span><span class="p">:</span> <span class="w"> </span><span class="k">fallthrough</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">neqSymbol</span><span class="p">:</span> <span class="w"> </span><span class="k">fallthrough</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">concatSymbol</span><span class="p">:</span> <span class="w"> </span><span class="k">fallthrough</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">plusSymbol</span><span class="p">:</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">3</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span> <span class="p">}</span> </pre></div> <p>Back in <code>parseExpression</code>, if the new operator has greater binding power we'll parse the next operand expression (a recursive call, passing the binding power of the new operator as the new <code>minBp</code>).</p> <p>Upon completion, the current expression (the return value of the recursive call) is set to a new binary expression containing the previously current expression on the left and the just-parsed expression on the right.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="o">...</span> <span class="w"> </span><span class="nx">b</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parseExpression</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">delimiters</span><span class="p">,</span><span class="w"> </span><span class="nx">bp</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected right operand&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">exp</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">expression</span><span class="p">{</span> <span class="w"> </span><span class="nx">binary</span><span class="p">:</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">binaryExpression</span><span class="p">{</span> <span class="w"> </span><span class="o">*</span><span class="nx">exp</span><span class="p">,</span> <span class="w"> </span><span class="o">*</span><span class="nx">b</span><span class="p">,</span> <span class="w"> </span><span class="o">*</span><span class="nx">op</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="nx">kind</span><span class="p">:</span><span class="w"> </span><span class="nx">binaryKind</span><span class="p">,</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">newCursor</span> <span class="w"> </span><span class="nx">lastCursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">cursor</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">exp</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span> <span class="p">}</span> </pre></div> <p>All together:</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">parseExpression</span><span class="p">(</span><span class="nx">tokens</span><span class="w"> </span><span class="p">[]</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="nx">delimiters</span><span class="w"> </span><span class="p">[]</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">minBp</span><span class="w"> </span><span class="kt">uint</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">expression</span><span class="p">,</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">initialCursor</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">exp</span><span class="w"> </span><span class="o">*</span><span class="nx">expression</span> <span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parseToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">leftParenSymbol</span><span class="p">))</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">newCursor</span> <span class="w"> </span><span class="nx">rightParenToken</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">rightParenSymbol</span><span class="p">)</span> <span class="w"> </span><span class="nx">exp</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">parseExpression</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">delimiters</span><span class="p">,</span><span class="w"> </span><span class="nx">rightParenToken</span><span class="p">),</span><span class="w"> </span><span class="nx">minBp</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected expression after opening paren&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">parseToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">rightParenToken</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected closing paren&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">exp</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">parseLiteralExpression</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">lastCursor</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">cursor</span> <span class="nx">outer</span><span class="p">:</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">tokens</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">d</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">delimiters</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">parseToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">d</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">break</span><span class="w"> </span><span class="nx">outer</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">binOps</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="nx">token</span><span class="p">{</span> <span class="w"> </span><span class="nx">tokenFromKeyword</span><span class="p">(</span><span class="nx">andKeyword</span><span class="p">),</span> <span class="w"> </span><span class="nx">tokenFromKeyword</span><span class="p">(</span><span class="nx">orKeyword</span><span class="p">),</span> <span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">eqSymbol</span><span class="p">),</span> <span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">neqSymbol</span><span class="p">),</span> <span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">concatSymbol</span><span class="p">),</span> <span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">plusSymbol</span><span class="p">),</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">op</span><span class="w"> </span><span class="o">*</span><span class="nx">token</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">bo</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">binOps</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">t</span><span class="w"> </span><span class="o">*</span><span class="nx">token</span> <span class="w"> </span><span class="nx">t</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">parseToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">bo</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">op</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">t</span> <span class="w"> </span><span class="k">break</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">op</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected binary operator&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">bp</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">op</span><span class="p">.</span><span class="nx">bindingPower</span><span class="p">()</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">bp</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nx">minBp</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">lastCursor</span> <span class="w"> </span><span class="k">break</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">b</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parseExpression</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">delimiters</span><span class="p">,</span><span class="w"> </span><span class="nx">bp</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected right operand&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">exp</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">expression</span><span class="p">{</span> <span class="w"> </span><span class="nx">binary</span><span class="p">:</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">binaryExpression</span><span class="p">{</span> <span class="w"> </span><span class="o">*</span><span class="nx">exp</span><span class="p">,</span> <span class="w"> </span><span class="o">*</span><span class="nx">b</span><span class="p">,</span> <span class="w"> </span><span class="o">*</span><span class="nx">op</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="nx">kind</span><span class="p">:</span><span class="w"> </span><span class="nx">binaryKind</span><span class="p">,</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">newCursor</span> <span class="w"> </span><span class="nx">lastCursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">cursor</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">exp</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span> <span class="p">}</span> </pre></div> <p>Now that we have this general parse expression helper in place, we can add support for parsing <code>WHERE</code> in <code>SELECT</code> statements.</p> <h3 id="parsing-where">Parsing WHERE</h3><p>This part's pretty easy. We modify the existing <code>parseSelectStatement</code> to search for an optional <code>WHERE</code> token followed by an expression.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">parseSelectStatement</span><span class="p">(</span><span class="nx">tokens</span><span class="w"> </span><span class="p">[]</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="nx">delimiter</span><span class="w"> </span><span class="nx">token</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">SelectStatement</span><span class="p">,</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="kt">bool</span> <span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">initialCursor</span> <span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">parseToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">tokenFromKeyword</span><span class="p">(</span><span class="nx">selectKeyword</span><span class="p">))</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">slct</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">SelectStatement</span><span class="p">{}</span> <span class="w"> </span><span class="nx">fromToken</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">tokenFromKeyword</span><span class="p">(</span><span class="nx">fromKeyword</span><span class="p">)</span> <span class="w"> </span><span class="nx">item</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parseSelectItem</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="nx">token</span><span class="p">{</span><span class="nx">fromToken</span><span class="p">,</span><span class="w"> </span><span class="nx">delimiter</span><span class="p">})</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">slct</span><span class="p">.</span><span class="nx">item</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">item</span> <span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">newCursor</span> <span class="w"> </span><span class="nx">whereToken</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">tokenFromKeyword</span><span class="p">(</span><span class="nx">whereKeyword</span><span class="p">)</span> <span class="w"> </span><span class="nx">delimiters</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="nx">token</span><span class="p">{</span><span class="nx">delimiter</span><span class="p">,</span><span class="w"> </span><span class="nx">whereToken</span><span class="p">}</span> <span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">parseToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">fromToken</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">from</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parseFromItem</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">delimiters</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected FROM item&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">slct</span><span class="p">.</span><span class="nx">from</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">from</span> <span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">newCursor</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">parseToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">whereToken</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">where</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parseExpression</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="nx">token</span><span class="p">{</span><span class="nx">delimiter</span><span class="p">},</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected WHERE conditionals&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">slct</span><span class="p">.</span><span class="nx">where</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">where</span> <span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">newCursor</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">slct</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span> <span class="p">}</span> </pre></div> <p>Now we're all done with parsing binary expressions and <code>WHERE</code> filters! If in doubt, refer to <a href="https://github.com/eatonphil/gosql/blob/master/parser.go">parser.go</a> in the project.</p> <h3 id="re-thinking-query-execution">Re-thinking query execution</h3><p>In the first post in this series, we didn't establish any standard way for interpreting an expression in any kind of statement. In SQL though, every expression is always run in the context of a row in a table. We'll handle cases like <code>SELECT 1</code> and <code>INSERT INTO users VALUES (1)</code> by creating a table with a single empty row to act as the context.</p> <p>This requires a bit of re-architecting. So we'll rewrite the <code>memory.go</code> implementation in this post from scratch.</p> <p>We'll also stop <code>panic</code>-ing when things go wrong. Instead we'll print a message. This allows the REPL loop to keep going.</p> <h4 id="memory-cells">Memory cells</h4><p>Again the fundamental blocks of memory in the table will be an untyped array of bytes. We'll provide conversion methods from this memory cell into integers, strings, and boolean Go values.</p> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">MemoryCell</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">mc</span><span class="w"> </span><span class="nx">MemoryCell</span><span class="p">)</span><span class="w"> </span><span class="nx">AsInt</span><span class="p">()</span><span class="w"> </span><span class="kt">int32</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="kt">int32</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">Read</span><span class="p">(</span><span class="nx">bytes</span><span class="p">.</span><span class="nx">NewBuffer</span><span class="p">(</span><span class="nx">mc</span><span class="p">),</span><span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">BigEndian</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">i</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">&quot;Corrupted data [%s]: %s\n&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">mc</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">i</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">mc</span><span class="w"> </span><span class="nx">MemoryCell</span><span class="p">)</span><span class="w"> </span><span class="nx">AsText</span><span class="p">()</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">mc</span><span class="p">)</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">mc</span><span class="w"> </span><span class="nx">MemoryCell</span><span class="p">)</span><span class="w"> </span><span class="nx">AsBool</span><span class="p">()</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">mc</span><span class="p">)</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="mi">0</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">mc</span><span class="w"> </span><span class="nx">MemoryCell</span><span class="p">)</span><span class="w"> </span><span class="nx">equals</span><span class="p">(</span><span class="nx">b</span><span class="w"> </span><span class="nx">MemoryCell</span><span class="p">)</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Seems verbose but need to make sure if one is nil, the</span> <span class="w"> </span><span class="c1">// comparison still fails quickly</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">mc</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">mc</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">bytes</span><span class="p">.</span><span class="nx">Compare</span><span class="p">(</span><span class="nx">mc</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span> <span class="p">}</span> </pre></div> <p>We'll also extend the <code>Cell</code> interface in <code>backend.go</code> to support the new boolean type.</p> <div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">gosql</span> <span class="kd">type</span><span class="w"> </span><span class="nx">ColumnType</span><span class="w"> </span><span class="kt">uint</span> <span class="kd">const</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="nx">TextType</span><span class="w"> </span><span class="nx">ColumnType</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">iota</span> <span class="w"> </span><span class="nx">IntType</span> <span class="w"> </span><span class="nx">BoolType</span> <span class="p">)</span> <span class="kd">type</span><span class="w"> </span><span class="nx">Cell</span><span class="w"> </span><span class="kd">interface</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">AsText</span><span class="p">()</span><span class="w"> </span><span class="kt">string</span> <span class="w"> </span><span class="nx">AsInt</span><span class="p">()</span><span class="w"> </span><span class="kt">int32</span> <span class="w"> </span><span class="nx">AsBool</span><span class="p">()</span><span class="w"> </span><span class="kt">bool</span> <span class="p">}</span> <span class="o">...</span> </pre></div> <p>Finally, we need a way for mapping a Go value <em>into</em> a memory cell.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">literalToMemoryCell</span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="o">*</span><span class="nx">token</span><span class="p">)</span><span class="w"> </span><span class="nx">MemoryCell</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">numericKind</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">buf</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">new</span><span class="p">(</span><span class="nx">bytes</span><span class="p">.</span><span class="nx">Buffer</span><span class="p">)</span> <span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">strconv</span><span class="p">.</span><span class="nx">Atoi</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">value</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">&quot;Corrupted data [%s]: %s\n&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">MemoryCell</span><span class="p">(</span><span class="kc">nil</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// TODO: handle bigint</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">Write</span><span class="p">(</span><span class="nx">buf</span><span class="p">,</span><span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">BigEndian</span><span class="p">,</span><span class="w"> </span><span class="nb">int32</span><span class="p">(</span><span class="nx">i</span><span class="p">))</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">&quot;Corrupted data [%s]: %s\n&quot;</span><span class="p">,</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">buf</span><span class="p">.</span><span class="nx">Bytes</span><span class="p">()),</span><span class="w"> </span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">MemoryCell</span><span class="p">(</span><span class="kc">nil</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">MemoryCell</span><span class="p">(</span><span class="nx">buf</span><span class="p">.</span><span class="nx">Bytes</span><span class="p">())</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">stringKind</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">MemoryCell</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">value</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">boolKind</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">value</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;true&quot;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">MemoryCell</span><span class="p">([]</span><span class="kt">byte</span><span class="p">{</span><span class="mi">1</span><span class="p">})</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">MemoryCell</span><span class="p">(</span><span class="kc">nil</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <p>And we'll provide global <code>true</code> and <code>false</code> values:</p> <div class="highlight"><pre><span></span><span class="kd">var</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="nx">trueToken</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">token</span><span class="p">{</span><span class="nx">kind</span><span class="p">:</span><span class="w"> </span><span class="nx">boolKind</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">:</span><span class="w"> </span><span class="s">&quot;true&quot;</span><span class="p">}</span> <span class="w"> </span><span class="nx">falseToken</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">token</span><span class="p">{</span><span class="nx">kind</span><span class="p">:</span><span class="w"> </span><span class="nx">boolKind</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">:</span><span class="w"> </span><span class="s">&quot;false&quot;</span><span class="p">}</span> <span class="w"> </span><span class="nx">trueMemoryCell</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">literalToMemoryCell</span><span class="p">(</span><span class="o">&amp;</span><span class="nx">trueToken</span><span class="p">)</span> <span class="w"> </span><span class="nx">falseMemoryCell</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">literalToMemoryCell</span><span class="p">(</span><span class="o">&amp;</span><span class="nx">falseToken</span><span class="p">)</span> <span class="p">)</span> </pre></div> <h4 id="tables">Tables</h4><p>A table has a list of rows (an array of memory cells) and a list of column names and types.</p> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">table</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">columns</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span> <span class="w"> </span><span class="nx">columnTypes</span><span class="w"> </span><span class="p">[]</span><span class="nx">ColumnType</span> <span class="w"> </span><span class="nx">rows</span><span class="w"> </span><span class="p">[][]</span><span class="nx">MemoryCell</span> <span class="p">}</span> </pre></div> <p>Finally we'll add a series of methods on <code>table</code> that, given a row index, interprets an expression AST against that row in the table.</p> <h3 id="interpreting-literals">Interpreting literals</h3><p>First we'll implement <code>evaluateLiteralCell</code> that will look up an identifier or return the value of integers, strings, and booleans.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="o">*</span><span class="nx">table</span><span class="p">)</span><span class="w"> </span><span class="nx">evaluateLiteralCell</span><span class="p">(</span><span class="nx">rowIndex</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="nx">exp</span><span class="w"> </span><span class="nx">expression</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="nx">MemoryCell</span><span class="p">,</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">ColumnType</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">exp</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">literalKind</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">ErrInvalidCell</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">lit</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">exp</span><span class="p">.</span><span class="nx">literal</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">lit</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">identifierKind</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">tableCol</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">columns</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">tableCol</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">lit</span><span class="p">.</span><span class="nx">value</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">rows</span><span class="p">[</span><span class="nx">rowIndex</span><span class="p">][</span><span class="nx">i</span><span class="p">],</span><span class="w"> </span><span class="nx">tableCol</span><span class="p">,</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">columnTypes</span><span class="p">[</span><span class="nx">i</span><span class="p">],</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">ErrColumnDoesNotExist</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">columnType</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">IntType</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">lit</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">stringKind</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">columnType</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">TextType</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">lit</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">boolKind</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">columnType</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">BoolType</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">literalToMemoryCell</span><span class="p">(</span><span class="nx">lit</span><span class="p">),</span><span class="w"> </span><span class="s">&quot;?column?&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">columnType</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <h3 id="interpreting-binary-expressions">Interpreting binary expressions</h3><p>Now we can implement <code>evaluateBinaryCell</code> that will evaluate it's two sub-expressions and combine them together according to the operator. The SQL operators we have defined so far do no coercion. So we'll fail immediately if the two sides of the operation are not of the same type. Additionally, the concatenation and addition operators require that their arguments are strings and numbers, respectively.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="o">*</span><span class="nx">table</span><span class="p">)</span><span class="w"> </span><span class="nx">evaluateBinaryCell</span><span class="p">(</span><span class="nx">rowIndex</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="nx">exp</span><span class="w"> </span><span class="nx">expression</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="nx">MemoryCell</span><span class="p">,</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">ColumnType</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">exp</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">binaryKind</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">ErrInvalidCell</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">bexp</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">exp</span><span class="p">.</span><span class="nx">binary</span> <span class="w"> </span><span class="nx">l</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">lt</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">evaluateCell</span><span class="p">(</span><span class="nx">rowIndex</span><span class="p">,</span><span class="w"> </span><span class="nx">bexp</span><span class="p">.</span><span class="nx">a</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">rt</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">evaluateCell</span><span class="p">(</span><span class="nx">rowIndex</span><span class="p">,</span><span class="w"> </span><span class="nx">bexp</span><span class="p">.</span><span class="nx">b</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">bexp</span><span class="p">.</span><span class="nx">op</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">symbolKind</span><span class="p">:</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">symbol</span><span class="p">(</span><span class="nx">bexp</span><span class="p">.</span><span class="nx">op</span><span class="p">.</span><span class="nx">value</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">eqSymbol</span><span class="p">:</span> <span class="w"> </span><span class="nx">eq</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">l</span><span class="p">.</span><span class="nx">equals</span><span class="p">(</span><span class="nx">r</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">lt</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">TextType</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">rt</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">TextType</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">eq</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">trueMemoryCell</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;?column?&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">BoolType</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">lt</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">IntType</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">rt</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">IntType</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">eq</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">trueMemoryCell</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;?column?&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">BoolType</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">lt</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">BoolType</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">rt</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">BoolType</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">eq</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">trueMemoryCell</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;?column?&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">BoolType</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">falseMemoryCell</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;?column?&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">BoolType</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">neqSymbol</span><span class="p">:</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">lt</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">rt</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="p">!</span><span class="nx">l</span><span class="p">.</span><span class="nx">equals</span><span class="p">(</span><span class="nx">r</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">trueMemoryCell</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;?column?&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">BoolType</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">falseMemoryCell</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;?column?&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">BoolType</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">concatSymbol</span><span class="p">:</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">lt</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">TextType</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">rt</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">TextType</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">ErrInvalidOperands</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">literalToMemoryCell</span><span class="p">(</span><span class="o">&amp;</span><span class="nx">token</span><span class="p">{</span><span class="nx">kind</span><span class="p">:</span><span class="w"> </span><span class="nx">stringKind</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">:</span><span class="w"> </span><span class="nx">l</span><span class="p">.</span><span class="nx">AsText</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">AsText</span><span class="p">()}),</span><span class="w"> </span><span class="s">&quot;?column?&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">TextType</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">plusSymbol</span><span class="p">:</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">lt</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">IntType</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">rt</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">IntType</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">ErrInvalidOperands</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">iValue</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">int</span><span class="p">(</span><span class="nx">l</span><span class="p">.</span><span class="nx">AsInt</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">AsInt</span><span class="p">())</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">literalToMemoryCell</span><span class="p">(</span><span class="o">&amp;</span><span class="nx">token</span><span class="p">{</span><span class="nx">kind</span><span class="p">:</span><span class="w"> </span><span class="nx">numericKind</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">:</span><span class="w"> </span><span class="nx">strconv</span><span class="p">.</span><span class="nx">Itoa</span><span class="p">(</span><span class="nx">iValue</span><span class="p">)}),</span><span class="w"> </span><span class="s">&quot;?column?&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">IntType</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="k">default</span><span class="p">:</span> <span class="w"> </span><span class="c1">// TODO</span> <span class="w"> </span><span class="k">break</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">keywordKind</span><span class="p">:</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">keyword</span><span class="p">(</span><span class="nx">bexp</span><span class="p">.</span><span class="nx">op</span><span class="p">.</span><span class="nx">value</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">andKeyword</span><span class="p">:</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">lt</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">BoolType</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">rt</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">BoolType</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">ErrInvalidOperands</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">res</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">falseMemoryCell</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">l</span><span class="p">.</span><span class="nx">AsBool</span><span class="p">()</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">AsBool</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">res</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">trueMemoryCell</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;?column?&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">BoolType</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">orKeyword</span><span class="p">:</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">lt</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">BoolType</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">rt</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">BoolType</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">ErrInvalidOperands</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">res</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">falseMemoryCell</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">l</span><span class="p">.</span><span class="nx">AsBool</span><span class="p">()</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">AsBool</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">res</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">trueMemoryCell</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;?column?&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">BoolType</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="k">default</span><span class="p">:</span> <span class="w"> </span><span class="c1">// TODO</span> <span class="w"> </span><span class="k">break</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">ErrInvalidCell</span> <span class="p">}</span> </pre></div> <p>Then we'll provide a generic <code>evaluateCell</code> method to wrap these two correctly:</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="o">*</span><span class="nx">table</span><span class="p">)</span><span class="w"> </span><span class="nx">evaluateCell</span><span class="p">(</span><span class="nx">rowIndex</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="nx">exp</span><span class="w"> </span><span class="nx">expression</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="nx">MemoryCell</span><span class="p">,</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">ColumnType</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">exp</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">literalKind</span><span class="p">:</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">evaluateLiteralCell</span><span class="p">(</span><span class="nx">rowIndex</span><span class="p">,</span><span class="w"> </span><span class="nx">exp</span><span class="p">)</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">binaryKind</span><span class="p">:</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">evaluateBinaryCell</span><span class="p">(</span><span class="nx">rowIndex</span><span class="p">,</span><span class="w"> </span><span class="nx">exp</span><span class="p">)</span> <span class="w"> </span><span class="k">default</span><span class="p">:</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="nx">ErrInvalidCell</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <h3 id="implementing-select">Implementing SELECT</h3><p>As before, each statement will operate on a backend of tables.</p> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">MemoryBackend</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">tables</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="o">*</span><span class="nx">table</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">NewMemoryBackend</span><span class="p">()</span><span class="w"> </span><span class="o">*</span><span class="nx">MemoryBackend</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">MemoryBackend</span><span class="p">{</span> <span class="w"> </span><span class="nx">tables</span><span class="p">:</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="o">*</span><span class="nx">table</span><span class="p">{},</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>When we implement <code>SELECT</code>, we'll iterate over each row in the table (we only support looking up one table for now). If the <code>SELECT</code> statement contains a <code>WHERE</code> block, we'll evaluate the <code>WHERE</code> expression against the current row and move on if the result is <code>false</code>.</p> <p>Otherwise for each expression in the <code>SELECT</code> list of items we'll evaluate it against the current row in the table.</p> <p>If there is no table selected, we provide a fake table with a single empty row.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">mb</span><span class="w"> </span><span class="o">*</span><span class="nx">MemoryBackend</span><span class="p">)</span><span class="w"> </span><span class="nx">Select</span><span class="p">(</span><span class="nx">slct</span><span class="w"> </span><span class="o">*</span><span class="nx">SelectStatement</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">Results</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">t</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">table</span><span class="p">{}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">slct</span><span class="p">.</span><span class="nx">from</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">slct</span><span class="p">.</span><span class="nx">from</span><span class="p">.</span><span class="nx">table</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="kt">bool</span> <span class="w"> </span><span class="nx">t</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">mb</span><span class="p">.</span><span class="nx">tables</span><span class="p">[</span><span class="nx">slct</span><span class="p">.</span><span class="nx">from</span><span class="p">.</span><span class="nx">table</span><span class="p">.</span><span class="nx">value</span><span class="p">]</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">ErrTableDoesNotExist</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">slct</span><span class="p">.</span><span class="nx">item</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="o">*</span><span class="nx">slct</span><span class="p">.</span><span class="nx">item</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">Results</span><span class="p">{},</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">results</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[][]</span><span class="nx">Cell</span><span class="p">{}</span> <span class="w"> </span><span class="nx">columns</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">Type</span><span class="w"> </span><span class="nx">ColumnType</span> <span class="w"> </span><span class="nx">Name</span><span class="w"> </span><span class="kt">string</span> <span class="w"> </span><span class="p">}{}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">slct</span><span class="p">.</span><span class="nx">from</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">t</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">table</span><span class="p">{}</span> <span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">rows</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="p">[][]</span><span class="nx">MemoryCell</span><span class="p">{{}}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">rows</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">result</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="nx">Cell</span><span class="p">{}</span> <span class="w"> </span><span class="nx">isFirstRow</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">results</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">slct</span><span class="p">.</span><span class="nx">where</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">val</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">evaluateCell</span><span class="p">(</span><span class="nb">uint</span><span class="p">(</span><span class="nx">i</span><span class="p">),</span><span class="w"> </span><span class="o">*</span><span class="nx">slct</span><span class="p">.</span><span class="nx">where</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">val</span><span class="p">.</span><span class="nx">AsBool</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">continue</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">col</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="o">*</span><span class="nx">slct</span><span class="p">.</span><span class="nx">item</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">col</span><span class="p">.</span><span class="nx">asterisk</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// TODO: handle asterisk</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="s">&quot;Skipping asterisk.&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">continue</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="nx">columnName</span><span class="p">,</span><span class="w"> </span><span class="nx">columnType</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">evaluateCell</span><span class="p">(</span><span class="nb">uint</span><span class="p">(</span><span class="nx">i</span><span class="p">),</span><span class="w"> </span><span class="o">*</span><span class="nx">col</span><span class="p">.</span><span class="nx">exp</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">isFirstRow</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">columns</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">columns</span><span class="p">,</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">Type</span><span class="w"> </span><span class="nx">ColumnType</span> <span class="w"> </span><span class="nx">Name</span><span class="w"> </span><span class="kt">string</span> <span class="w"> </span><span class="p">}{</span> <span class="w"> </span><span class="nx">Type</span><span class="p">:</span><span class="w"> </span><span class="nx">columnType</span><span class="p">,</span> <span class="w"> </span><span class="nx">Name</span><span class="p">:</span><span class="w"> </span><span class="nx">columnName</span><span class="p">,</span> <span class="w"> </span><span class="p">})</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">result</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">result</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">results</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">results</span><span class="p">,</span><span class="w"> </span><span class="nx">result</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">Results</span><span class="p">{</span> <span class="w"> </span><span class="nx">Columns</span><span class="p">:</span><span class="w"> </span><span class="nx">columns</span><span class="p">,</span> <span class="w"> </span><span class="nx">Rows</span><span class="p">:</span><span class="w"> </span><span class="nx">results</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <h3 id="implementing-insert,-create">Implementing INSERT, CREATE</h3><p>The <code>INSERT</code> and <code>CREATE</code> statements stay mostly the same except for that we'll use the <code>evaluateCell</code> help for every expression. Refer back to the first post if the implementation is otherwise unclear.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">mb</span><span class="w"> </span><span class="o">*</span><span class="nx">MemoryBackend</span><span class="p">)</span><span class="w"> </span><span class="nx">Insert</span><span class="p">(</span><span class="nx">inst</span><span class="w"> </span><span class="o">*</span><span class="nx">InsertStatement</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">t</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">mb</span><span class="p">.</span><span class="nx">tables</span><span class="p">[</span><span class="nx">inst</span><span class="p">.</span><span class="nx">table</span><span class="p">.</span><span class="nx">value</span><span class="p">]</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">ErrTableDoesNotExist</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">inst</span><span class="p">.</span><span class="nx">values</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">row</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="nx">MemoryCell</span><span class="p">{}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="o">*</span><span class="nx">inst</span><span class="p">.</span><span class="nx">values</span><span class="p">)</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">columns</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">ErrMissingValues</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="o">*</span><span class="nx">inst</span><span class="p">.</span><span class="nx">values</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">value</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">literalKind</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="s">&quot;Skipping non-literal.&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">continue</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">emptyTable</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">table</span><span class="p">{}</span> <span class="w"> </span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">emptyTable</span><span class="p">.</span><span class="nx">evaluateCell</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="w"> </span><span class="o">*</span><span class="nx">value</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">row</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">row</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">rows</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">rows</span><span class="p">,</span><span class="w"> </span><span class="nx">row</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">mb</span><span class="w"> </span><span class="o">*</span><span class="nx">MemoryBackend</span><span class="p">)</span><span class="w"> </span><span class="nx">CreateTable</span><span class="p">(</span><span class="nx">crt</span><span class="w"> </span><span class="o">*</span><span class="nx">CreateTableStatement</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">t</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">table</span><span class="p">{}</span> <span class="w"> </span><span class="nx">mb</span><span class="p">.</span><span class="nx">tables</span><span class="p">[</span><span class="nx">crt</span><span class="p">.</span><span class="nx">name</span><span class="p">.</span><span class="nx">value</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">t</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">crt</span><span class="p">.</span><span class="nx">cols</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">col</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="o">*</span><span class="nx">crt</span><span class="p">.</span><span class="nx">cols</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">columns</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">columns</span><span class="p">,</span><span class="w"> </span><span class="nx">col</span><span class="p">.</span><span class="nx">name</span><span class="p">.</span><span class="nx">value</span><span class="p">)</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">dt</span><span class="w"> </span><span class="nx">ColumnType</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">col</span><span class="p">.</span><span class="nx">datatype</span><span class="p">.</span><span class="nx">value</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s">&quot;int&quot;</span><span class="p">:</span> <span class="w"> </span><span class="nx">dt</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">IntType</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s">&quot;text&quot;</span><span class="p">:</span> <span class="w"> </span><span class="nx">dt</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">TextType</span> <span class="w"> </span><span class="k">default</span><span class="p">:</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">ErrInvalidDatatype</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">columnTypes</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">columnTypes</span><span class="p">,</span><span class="w"> </span><span class="nx">dt</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <h3 id="back-to-the-repl">Back to the REPL</h3><p>Putting it all together, we run the following session:</p> <div class="highlight"><pre><span></span><span class="err">#</span><span class="w"> </span><span class="nx">CREATE</span><span class="w"> </span><span class="nx">TABLE</span><span class="w"> </span><span class="nx">users</span><span class="w"> </span><span class="p">(</span><span class="nx">name</span><span class="w"> </span><span class="nx">TEXT</span><span class="p">,</span><span class="w"> </span><span class="nx">age</span><span class="w"> </span><span class="nx">INT</span><span class="p">);</span> <span class="nx">ok</span> <span class="err">#</span><span class="w"> </span><span class="nx">INSERT</span><span class="w"> </span><span class="nx">INTO</span><span class="w"> </span><span class="nx">users</span><span class="w"> </span><span class="nx">VALUES</span><span class="w"> </span><span class="p">(</span><span class="err">&#39;</span><span class="nx">Stephen</span><span class="err">&#39;</span><span class="p">,</span><span class="w"> </span><span class="mi">16</span><span class="p">);</span> <span class="nx">ok</span> <span class="err">#</span><span class="w"> </span><span class="nx">SELECT</span><span class="w"> </span><span class="nx">name</span><span class="p">,</span><span class="w"> </span><span class="nx">age</span><span class="w"> </span><span class="nx">FROM</span><span class="w"> </span><span class="nx">users</span><span class="p">;</span> <span class="nx">name</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nx">age</span> <span class="o">----------+------</span> <span class="nx">Stephen</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">16</span> <span class="p">(</span><span class="mi">1</span><span class="w"> </span><span class="nx">result</span><span class="p">)</span> <span class="nx">ok</span> <span class="err">#</span><span class="w"> </span><span class="nx">INSERT</span><span class="w"> </span><span class="nx">INTO</span><span class="w"> </span><span class="nx">users</span><span class="w"> </span><span class="nx">VALUES</span><span class="w"> </span><span class="p">(</span><span class="err">&#39;</span><span class="nx">Adrienne</span><span class="err">&#39;</span><span class="p">,</span><span class="w"> </span><span class="mi">23</span><span class="p">);</span> <span class="nx">ok</span> <span class="err">#</span><span class="w"> </span><span class="nx">SELECT</span><span class="w"> </span><span class="nx">age</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="nx">name</span><span class="w"> </span><span class="nx">FROM</span><span class="w"> </span><span class="nx">users</span><span class="w"> </span><span class="nx">WHERE</span><span class="w"> </span><span class="nx">age</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">23</span><span class="p">;</span> <span class="nx">age</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nx">name</span> <span class="o">------+-----------</span> <span class="mi">25</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="nx">Adrienne</span> <span class="p">(</span><span class="mi">1</span><span class="w"> </span><span class="nx">result</span><span class="p">)</span> <span class="nx">ok</span> <span class="err">#</span><span class="w"> </span><span class="nx">SELECT</span><span class="w"> </span><span class="nx">name</span><span class="w"> </span><span class="nx">FROM</span><span class="w"> </span><span class="nx">users</span><span class="p">;</span> <span class="nx">name</span> <span class="o">------------</span> <span class="nx">Stephen</span> <span class="nx">Adrienne</span> <span class="p">(</span><span class="mi">2</span><span class="w"> </span><span class="nx">results</span><span class="p">)</span> <span class="nx">ok</span> </pre></div> <p>And that's it for now! In future posts we'll get into indices, joining tables, etc.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Latest post up in the database basics series: adding support for binary expressions and WHERE filtering in SELECTs.<br><br>Much nicer to have a real table rendering library and readline implementation in the REPL too.<a href="https://t.co/GYzn3FUNon">https://t.co/GYzn3FUNon</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1249426633347473408?ref_src=twsrc%5Etfw">April 12, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/database-basics-expressions-and-where.htmlSun, 12 Apr 2020 00:00:00 +0000Studying foreign languages with inbox zerohttp://notes.eatonphil.com/studying-with-inbox-zero.html<p>The only time I've been able to seriously, rapidly improve my ability to speak a foreign language was through intensive language courses in college. I was forced to actively speak, read, and write Chinese for 6-8 hours a week (1-2 hours every day). Then study another 5-10 hours a week in preparation for the active sessions. I went three semesters like this before I left school.</p> <p>I've been trying to recreate that intensity since and mostly failed. After marrying a Korean, I've redirected the little effort I can muster to learning Korean. Aside from stints over the years (mostly for a month or two before or after a trip to Korea), I haven't been able to keep up any practice.</p> <p>One thing I've tried over the years to commit myself to learning a number of different topics is to set up recurring calendar invites: "Study Linux", "Study TCP/IP", "Study Korean", etc.</p> <p>This has mostly failed too. However, I do always <em>look</em> at the invites as I get notified.</p> <p>I keep inbox zero and I check my email many times a day, marking each email read dilligently when I no longer need to think about it.</p> <p>Tools like Quizlet, Anki, or even Duolingo let you self-learn vocabulary <em>when you feel like it</em>. But basically no service will try to keep giving you exposure to some set of topics whether you spend time on it or not.</p> <p>The most important thing I can think of is forced exposure to vocabulary. So I've been planning for some time to hook up a list of the one thousand most common Korean words to scheduled emails.</p> <p>This weekend I finally got around to scripting the Google Calendar API against the words list. I have an event for each word for the next 1000 days. Each day I receive a summary email including all events of the day and the new word is part of it.</p> <p>This is a pretty indirect approach but it's pretty simple to set up. It's not very easy to reconfigure.</p> <p>The code for doing this is <a href="https://github.com/eatonphil/learnit">available on Github</a> if you're interested. And if you know a service that can build and manage scheduled notifications against a spreadsheet or database I'd rather be looking at that.</p> <p>We'll see how this works out.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Daily new words in my inbox feels like the only way I can &quot;force&quot; myself to get exposed to new vocabulary. Wish there were a service for scheduling notifications from a spreadsheet. Finally got to scripting GCal&#39;s API populating daily events from 1000 most common Korean words</p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1246557948068925441?ref_src=twsrc%5Etfw">April 4, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/studying-with-inbox-zero.htmlSat, 04 Apr 2020 00:00:00 +0000Reviewing the Surface Book 2http://notes.eatonphil.com/reviewing-the-surface-book-2.html<p>The first few paragraphs cover what I was looking for and what I considered. Then the review.</p> <h3 id="why-the-surface-book-2">Why the Surface Book 2</h3><p>I used a Macbook throughout my professional career until I had the choice a few years ago when I started my current job. Here, I ran Gentoo, then FreeBSD, then Arch, and now Windows 10 on the Dell XPS 15.</p> <p>I enjoy Windows and I think Microsoft is doing a better job on hardware and software these days. At least, compared to Apple, they appear to be trying. So when my personal 2015 Macbook Pro died this year I decided to buy and run Windows at home.</p> <p>On my Mac, I dealt with bad battery life for a while: running VMs, running Docker, compiling Go, running Node.js kills any battery. So I moved my development into the cloud and gained on battery life and network speeds at the cost of memory (I am paying for 4GB of RAM).</p> <p>My ideal replacement was a cheaper machine that felt as good as a 2015 Macbook Pro. (The build quality has not been good since.) I was hoping not to pay more than $1000. My shortlist included the Surface Book 2, the Surface Pro X, the Surface Laptop 3, the Lenovo Yoga 14, and the LG Gram. So I went to Best Buy to try them out.</p> <p>I was impressed by every Surface device. At first sight, I mistook the Surface Book and Surface Laptop for an old Macbook Pro. They both have a brushed aluminum body with a large trackpad and great keyboards. Even the Surface Pro X, which is a tablet, has an addon keyboard that is easy to type (that is, program) on.</p> <p>I tried out the Lenovo Yoga 14 and it was solid, but I preferred the brushed aluminum body of the Surface devices. I did not get a chance to feel out the LG Gram.</p> <p>I eliminated the Surface Laptop 3 because I like tablet mode. While the Surface Laptop 3 is a touchscreen, it is not a 2-in-1 device and does not have tablet mode.</p> <p>And I eliminated the Surface Pro X because it is one of the first mainstream Windows ARM devices. While Windows on ARM is now the same operating system as Windows on a desktop, most consumer software ships x86_64 (not ARM) binaries. Windows on ARM can emulate x86 but not yet x86_64. I didn't feel like working around this on my primary personal device.</p> <p>I bought the 13.5", 7th generation i5 Surface Book 2 for $999. It comes with 8GB DDR4 RAM and a 128GB SSD. I have had the device for two weeks now and I use it at least 10 hours a day.</p> <h3 id="keyboard">Keyboard</h3><p>The keyboard layout is standard, easy to use. The control, shift, caps, function, and alt keys are big enough that it is easy to program without staring at the keyboard. The up and down arrow keys are smaller than would be nice. But they are easier for me to find than on a 2019 Macbook Pro.</p> <p>The function key is modal by default (like a Caps key) and indicates if function is enabled with a small LED. I have never seen a function key like this. I find it annoying when I turn it on.</p> <p>And while there is builtin volume controls and a play/pause button, there is no media forward/back button. I assigned Ctrl+Windows+Alt+Left/Right to be media forward/back.</p> <p>There is also no right Ctrl key. Instead there is a "media key" which is the equivalent of right-clicking... I guess. This is useless so I mapped it back to another Ctrl key.</p> <p>Unlike macOS, which needs an app like Spectacle, Windows default window control shortcuts are great. Windows+Left to send to the left half, Windows+Right to send to the right half, Windows+Up to make full screen.</p> <p>But macOS default swipe gestures are more intuitive: swipe left to go backwards, swipe right to go forwards. So I mapped this back myself.</p> <p><a href="https://gist.github.com/eatonphil/0a684561d599fcd94128ff462a5253b7">Here is my autohotkey script.</a></p> <h3 id="screen">Screen</h3><p>The 13.5" screen feels top-heavy but may not actually weigh more than the keyboard/body. The bevel is larger than it feels like it should be. But the camera is in the right location: top and center.</p> <p>Additionally, the default behavior when attaching/detaching the screen is to prompt you to enter/exit tablet mode rather than doing it for you. This prompt is easy to click out of and after doing so the option to switch between disappears until you reattach and detach again.</p> <p>The screen isn't flush with the body when you close it. Few marketing pictures show you this, but here's <a href="https://assets.pcmag.com/media/images/563021-microsoft-surface-book-2-15-inch.jpg?thumb=y">one</a>. This makes me worry something may snap if the laptop is ever slammed against a wall for some reason.</p> <p>And fully open, it only goes back 120 degrees. This makes it hard to look at if it is on your legs and your legs are up higher than 90 degrees.</p> <p>Finally, the headphone jack is not on the body but on the screen. This makes sense since the screen is detachable. But the jack is on the top-right corner, further away than usual. This requires me to be closer to the screen to feel like I am not pulling the screen when I am wearing headphones.</p> <h4 id="pen">Pen</h4><p>The Surface Pen is awesome and the screen's palm detection is too. I have had a lot of fun drawing on it in Paint 3. And it has been useful in annotating mockups for work too.</p> <p>It costs $100 and comes with a AAAA battery. It is magnetized and sticks to the left side of the screen.</p> <h3 id="body">Body</h3><p>As mentioned, the body is a brushed aluminum. It feels great. The power input is magnetic, which is helpful. But it uses a novel Surface-specific input rather than USB-C, so that sucks. A new charger from Microsoft costs $100.</p> <p>The speakers are as good as Macbook speakers were 5 years ago. They don't have much bass. Additionally, these speakers get a little distorted at top volume.</p> <p>The battery lasts 7-8 hours without charging. While this is as advertised, it is still disappointing of a new laptop in 2020 that is only running Chrome, Spotify, and Windows Terminal.</p> <h4 id="tablet">Tablet</h4><p>To release the screen from the body, there is a key on the function row. However, it is not a hardware release. So when I accidentally killed the battery while the screen was flipped, I couldn't detach the screen after booting (to turn it back into a laptop) until after 10-20 minutes of charging.</p> <p>The screen isn't easy to detach. It requires both hands lifting up from the base of the screen to get enough leverage. You cannot pull up from the top of the screen.</p> <p>Aside from drawing apps, tablet mode apps on Windows aren't great. Kindle for Windows on tablet is terrible. I got stuck in Kindle's full screen mode and couldn't adjust the page size or exit full screen mode without reverting back to laptop mode first.</p> <p>Tablet mode also throws away the standard Windows menu and shortcuts to give you a desktop of application cards. However, these cards don't adapt to recent or frequent applications. After I deleted Candy Crush and other built in apps I will never use, this desktop is blank except for Edge and Groove Music. It is incredible how bad the tablet desktop is. You have to use the full application list view every time you want to open a new program.</p> <h3 id="in-summary">In summary</h3><p>It's not a bad Windows machine for $1000. The body is great quality and the pen/screen interaction is solid. But I'd like to see Windows invest more in a useful tablet experience. And the detachable screen comes at the cost of being a awkward. So I'd go with the Surface Pro X or Surface Laptop 3 next time.</p> <p>But above all I can't shake the expectation that a laptop built in 2020 running GMail and Slack in Chrome, Spotify, and a terminal application should last at least 10 hours.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Wrote a short post reviewing Microsoft&#39;s Surface Book 2<a href="https://t.co/0n6K3y6FBC">https://t.co/0n6K3y6FBC</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1241503107806384133?ref_src=twsrc%5Etfw">March 21, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/reviewing-the-surface-book-2.htmlWed, 18 Mar 2020 00:00:00 +0000Writing a SQL database from scratch in Go: 1. SELECT, INSERT, CREATE and a REPLhttp://notes.eatonphil.com/database-basics.html<p class="note"> Next in database basics: <! forgive me, for I have sinned > <br /> <a href="/database-basics-expressions-and-where.html">2. binary expressions and WHERE filters</a> <br /> <a href="/database-basics-indexes.html">3. indexes</a> <br /> <a href="/database-basics-a-database-sql-driver.html">4. a database/sql driver</a> </p><p>In this series we'll write a rudimentary database from scratch in Go. Project source code is available on <a href="https://github.com/eatonphil/gosql">Github</a>.</p> <p>In this first post we'll build enough of a parser to run some simple <code>CREATE</code>, <code>INSERT</code>, and <code>SELECT</code> queries. Then we'll build an in-memory backend supporting <code>TEXT</code> and <code>INT</code> types and write a basic REPL.</p> <p>We'll be able to support the following interaction:</p> <div class="highlight"><pre><span></span><span class="err">$</span><span class="w"> </span><span class="k">go</span><span class="w"> </span><span class="n">run</span><span class="w"> </span><span class="o">*</span><span class="p">.</span><span class="k">go</span> <span class="n">Welcome</span><span class="w"> </span><span class="k">to</span><span class="w"> </span><span class="n">gosql</span><span class="p">.</span> <span class="o">#</span><span class="w"> </span><span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="p">(</span><span class="n">id</span><span class="w"> </span><span class="nb">INT</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="nb">TEXT</span><span class="p">);</span> <span class="n">ok</span> <span class="o">#</span><span class="w"> </span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;Phil&#39;</span><span class="p">);</span> <span class="n">ok</span> <span class="o">#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">id</span><span class="p">,</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">users</span><span class="p">;</span> <span class="o">|</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">|</span> <span class="o">====================</span> <span class="o">|</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">Phil</span><span class="w"> </span><span class="o">|</span> <span class="n">ok</span> <span class="o">#</span><span class="w"> </span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">users</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="mi">2</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;Kate&#39;</span><span class="p">);</span> <span class="n">ok</span> <span class="o">#</span><span class="w"> </span><span class="k">SELECT</span><span class="w"> </span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="k">FROM</span><span class="w"> </span><span class="n">users</span><span class="p">;</span> <span class="o">|</span><span class="w"> </span><span class="n">name</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="o">|</span> <span class="o">====================</span> <span class="o">|</span><span class="w"> </span><span class="n">Phil</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">|</span> <span class="o">|</span><span class="w"> </span><span class="n">Kate</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="o">|</span> <span class="n">ok</span> </pre></div> <p>The first stage will be to map a SQL source into a list of tokens (lexing). Then we'll call parse functions to find individual SQL statements (such as <code>SELECT</code>). These parse functions will in turn call their own helper functions to find patterns of recursively parseable chunks, keywords, symbols (like parenthesis), identifiers (like a table name), and numeric or string literals.</p> <p>Then, we'll write an in-memory backend to do operations based on an AST. Finally, we'll write a REPL to accept SQL from a CLI and pass it to the in-memory backend.</p> <p class="note"> This post assumes a basic understanding of parsing concepts. We won't skip any code, but also won't go into great detail on why we structure the way we do. <br /> <br /> For a simpler introduction to parsing and parsing concepts, see <a href="/writing-a-simple-json-parser.html">this post on parsing JSON</a>. </p><h3 id="lexing">Lexing</h3><p>The lexer is responsible for finding every distinct group of characters in source code: tokens. This will consist primarily of identifiers, numbers, strings, and symbols.</p> <p class="note"> What follows is a second, more orthodox pass at lexing. The first pass took a number of shortcuts and couldn't handle spaces in strings, for example. <br /> <br /> <a href="https://github.com/eatonphil/gosql/pull/2">Here is the relevant pull request in gosql if you are curious.</a> </p><p>The gist of the logic will be to pass control to a helper function for each kind of token. If the helper function succeeds in finding a token, it will return true and the location for the lexer to start at next. It will continue doing this until it reaches the end of the source.</p> <p>First off, we'll define a few types and constants for use in <code>lexer.go</code>:</p> <div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">gosql</span> <span class="kn">import</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="s">&quot;fmt&quot;</span> <span class="w"> </span><span class="s">&quot;strings&quot;</span> <span class="p">)</span> <span class="kd">type</span><span class="w"> </span><span class="nx">location</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">line</span><span class="w"> </span><span class="kt">uint</span> <span class="w"> </span><span class="nx">col</span><span class="w"> </span><span class="kt">uint</span> <span class="p">}</span> <span class="kd">type</span><span class="w"> </span><span class="nx">keyword</span><span class="w"> </span><span class="kt">string</span> <span class="kd">const</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="nx">selectKeyword</span><span class="w"> </span><span class="nx">keyword</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">&quot;select&quot;</span> <span class="w"> </span><span class="nx">fromKeyword</span><span class="w"> </span><span class="nx">keyword</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">&quot;from&quot;</span> <span class="w"> </span><span class="nx">asKeyword</span><span class="w"> </span><span class="nx">keyword</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">&quot;as&quot;</span> <span class="w"> </span><span class="nx">tableKeyword</span><span class="w"> </span><span class="nx">keyword</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">&quot;table&quot;</span> <span class="w"> </span><span class="nx">createKeyword</span><span class="w"> </span><span class="nx">keyword</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">&quot;create&quot;</span> <span class="w"> </span><span class="nx">insertKeyword</span><span class="w"> </span><span class="nx">keyword</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">&quot;insert&quot;</span> <span class="w"> </span><span class="nx">intoKeyword</span><span class="w"> </span><span class="nx">keyword</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">&quot;into&quot;</span> <span class="w"> </span><span class="nx">valuesKeyword</span><span class="w"> </span><span class="nx">keyword</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">&quot;values&quot;</span> <span class="w"> </span><span class="nx">intKeyword</span><span class="w"> </span><span class="nx">keyword</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">&quot;int&quot;</span> <span class="w"> </span><span class="nx">textKeyword</span><span class="w"> </span><span class="nx">keyword</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">&quot;text&quot;</span> <span class="p">)</span> <span class="kd">type</span><span class="w"> </span><span class="nx">symbol</span><span class="w"> </span><span class="kt">string</span> <span class="kd">const</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="nx">semicolonSymbol</span><span class="w"> </span><span class="nx">symbol</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">&quot;;&quot;</span> <span class="w"> </span><span class="nx">asteriskSymbol</span><span class="w"> </span><span class="nx">symbol</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">&quot;*&quot;</span> <span class="w"> </span><span class="nx">commaSymbol</span><span class="w"> </span><span class="nx">symbol</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">&quot;,&quot;</span> <span class="w"> </span><span class="nx">leftparenSymbol</span><span class="w"> </span><span class="nx">symbol</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">&quot;(&quot;</span> <span class="w"> </span><span class="nx">rightparenSymbol</span><span class="w"> </span><span class="nx">symbol</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">&quot;)&quot;</span> <span class="p">)</span> <span class="kd">type</span><span class="w"> </span><span class="nx">tokenKind</span><span class="w"> </span><span class="kt">uint</span> <span class="kd">const</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="nx">keywordKind</span><span class="w"> </span><span class="nx">tokenKind</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">iota</span> <span class="w"> </span><span class="nx">symbolKind</span> <span class="w"> </span><span class="nx">identifierKind</span> <span class="w"> </span><span class="nx">stringKind</span> <span class="w"> </span><span class="nx">numericKind</span> <span class="p">)</span> <span class="kd">type</span><span class="w"> </span><span class="nx">token</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="kt">string</span> <span class="w"> </span><span class="nx">kind</span><span class="w"> </span><span class="nx">tokenKind</span> <span class="w"> </span><span class="nx">loc</span><span class="w"> </span><span class="nx">location</span> <span class="p">}</span> <span class="kd">type</span><span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">pointer</span><span class="w"> </span><span class="kt">uint</span> <span class="w"> </span><span class="nx">loc</span><span class="w"> </span><span class="nx">location</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="o">*</span><span class="nx">token</span><span class="p">)</span><span class="w"> </span><span class="nx">equals</span><span class="p">(</span><span class="nx">other</span><span class="w"> </span><span class="o">*</span><span class="nx">token</span><span class="p">)</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">value</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">other</span><span class="p">.</span><span class="nx">value</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">other</span><span class="p">.</span><span class="nx">kind</span> <span class="p">}</span> <span class="kd">type</span><span class="w"> </span><span class="nx">lexer</span><span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span> </pre></div> <p>Next we'll write out the main loop:</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">lex</span><span class="p">(</span><span class="nx">source</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">([]</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">tokens</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="o">*</span><span class="nx">token</span><span class="p">{}</span> <span class="w"> </span><span class="nx">cur</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">cursor</span><span class="p">{}</span> <span class="nx">lex</span><span class="p">:</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">source</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">lexers</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="nx">lexer</span><span class="p">{</span><span class="nx">lexKeyword</span><span class="p">,</span><span class="w"> </span><span class="nx">lexSymbol</span><span class="p">,</span><span class="w"> </span><span class="nx">lexString</span><span class="p">,</span><span class="w"> </span><span class="nx">lexNumeric</span><span class="p">,</span><span class="w"> </span><span class="nx">lexIdentifier</span><span class="p">}</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">l</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">lexers</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">l</span><span class="p">(</span><span class="nx">source</span><span class="p">,</span><span class="w"> </span><span class="nx">cur</span><span class="p">);</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">cur</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">newCursor</span> <span class="w"> </span><span class="c1">// Omit nil tokens for valid, but empty syntax like newlines</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">token</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">tokens</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">token</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">continue</span><span class="w"> </span><span class="nx">lex</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">hint</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="s">&quot;&quot;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">tokens</span><span class="p">)</span><span class="w"> </span><span class="p">&gt;</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">hint</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="s">&quot; after &quot;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">tokens</span><span class="p">[</span><span class="nb">len</span><span class="p">(</span><span class="nx">tokens</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">].</span><span class="nx">value</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Errorf</span><span class="p">(</span><span class="s">&quot;Unable to lex token%s, at %d:%d&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">hint</span><span class="p">,</span><span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">loc</span><span class="p">.</span><span class="nx">line</span><span class="p">,</span><span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">loc</span><span class="p">.</span><span class="nx">col</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <p>Then we'll write a helper for each kind of fundemental token.</p> <h4 id="analyzing-numbers">Analyzing numbers</h4><p>Numbers are the most complex. So we'll refer to the <a href="https://www.postgresql.org/docs/current/sql-syntax-lexical.html">PostgreSQL documentation (section 4.1.2.6)</a> for what constitutes a valid number.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">lexNumeric</span><span class="p">(</span><span class="nx">source</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="w"> </span><span class="nx">cursor</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">cur</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">ic</span> <span class="w"> </span><span class="nx">periodFound</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="nx">expMarkerFound</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">;</span><span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">source</span><span class="p">));</span><span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="o">++</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">source</span><span class="p">[</span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="p">]</span> <span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">loc</span><span class="p">.</span><span class="nx">col</span><span class="o">++</span> <span class="w"> </span><span class="nx">isDigit</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="sc">&#39;0&#39;</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="sc">&#39;9&#39;</span> <span class="w"> </span><span class="nx">isPeriod</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;.&#39;</span> <span class="w"> </span><span class="nx">isExpMarker</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;e&#39;</span> <span class="w"> </span><span class="c1">// Must start with a digit or period</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">ic</span><span class="p">.</span><span class="nx">pointer</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">isDigit</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="p">!</span><span class="nx">isPeriod</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">periodFound</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">isPeriod</span> <span class="w"> </span><span class="k">continue</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">isPeriod</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">periodFound</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">periodFound</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">true</span> <span class="w"> </span><span class="k">continue</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">isExpMarker</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">expMarkerFound</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// No periods allowed after expMarker</span> <span class="w"> </span><span class="nx">periodFound</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">true</span> <span class="w"> </span><span class="nx">expMarkerFound</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">true</span> <span class="w"> </span><span class="c1">// expMarker must be followed by digits</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">source</span><span class="p">)</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">cNext</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">source</span><span class="p">[</span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="o">+</span><span class="mi">1</span><span class="p">]</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">cNext</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;-&#39;</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">cNext</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;+&#39;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="o">++</span> <span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">loc</span><span class="p">.</span><span class="nx">col</span><span class="o">++</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">continue</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">isDigit</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">break</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// No characters accumulated</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">ic</span><span class="p">.</span><span class="nx">pointer</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">token</span><span class="p">{</span> <span class="w"> </span><span class="nx">value</span><span class="p">:</span><span class="w"> </span><span class="nx">source</span><span class="p">[</span><span class="nx">ic</span><span class="p">.</span><span class="nx">pointer</span><span class="p">:</span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="p">],</span> <span class="w"> </span><span class="nx">loc</span><span class="p">:</span><span class="w"> </span><span class="nx">ic</span><span class="p">.</span><span class="nx">loc</span><span class="p">,</span> <span class="w"> </span><span class="nx">kind</span><span class="p">:</span><span class="w"> </span><span class="nx">numericKind</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="nx">cur</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span> <span class="p">}</span> </pre></div> <h4 id="analyzing-strings">Analyzing strings</h4><p>Strings must start and end with a single apostrophe. They can contain a single apostophe if it is followed by another single apostrophe. We'll put this kind of character delimited lexing logic into a helper function so we can use it again when analyzing identifiers.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">lexCharacterDelimited</span><span class="p">(</span><span class="nx">source</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">delimiter</span><span class="w"> </span><span class="kt">byte</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">cur</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">ic</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">source</span><span class="p">[</span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="p">:])</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">source</span><span class="p">[</span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="p">]</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">delimiter</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">loc</span><span class="p">.</span><span class="nx">col</span><span class="o">++</span> <span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="o">++</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">;</span><span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">source</span><span class="p">));</span><span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="o">++</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">source</span><span class="p">[</span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="p">]</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">delimiter</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// SQL escapes are via double characters, not backslash.</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="o">+</span><span class="mi">1</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">source</span><span class="p">))</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">source</span><span class="p">[</span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="o">+</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">delimiter</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">token</span><span class="p">{</span> <span class="w"> </span><span class="nx">value</span><span class="p">:</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">value</span><span class="p">),</span> <span class="w"> </span><span class="nx">loc</span><span class="p">:</span><span class="w"> </span><span class="nx">ic</span><span class="p">.</span><span class="nx">loc</span><span class="p">,</span> <span class="w"> </span><span class="nx">kind</span><span class="p">:</span><span class="w"> </span><span class="nx">stringKind</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="nx">cur</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="nx">delimiter</span><span class="p">)</span> <span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="o">++</span> <span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">loc</span><span class="p">.</span><span class="nx">col</span><span class="o">++</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="p">)</span> <span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">loc</span><span class="p">.</span><span class="nx">col</span><span class="o">++</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">lexString</span><span class="p">(</span><span class="nx">source</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="w"> </span><span class="nx">cursor</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">lexCharacterDelimited</span><span class="p">(</span><span class="nx">source</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="p">,</span><span class="w"> </span><span class="sc">&#39;\&#39;&#39;</span><span class="p">)</span> <span class="p">}</span> </pre></div> <h4 id="analyzing-symbols-and-keywords">Analyzing symbols and keywords</h4><p>Symbols come from a fixed set of strings, so they're easy to compare against. Whitespace should be thrown away.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">lexSymbol</span><span class="p">(</span><span class="nx">source</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="w"> </span><span class="nx">cursor</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">source</span><span class="p">[</span><span class="nx">ic</span><span class="p">.</span><span class="nx">pointer</span><span class="p">]</span> <span class="w"> </span><span class="nx">cur</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">ic</span> <span class="w"> </span><span class="c1">// Will get overwritten later if not an ignored syntax</span> <span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="o">++</span> <span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">loc</span><span class="p">.</span><span class="nx">col</span><span class="o">++</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Syntax that should be thrown away</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="sc">&#39;\n&#39;</span><span class="p">:</span> <span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">loc</span><span class="p">.</span><span class="nx">line</span><span class="o">++</span> <span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">loc</span><span class="p">.</span><span class="nx">col</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="mi">0</span> <span class="w"> </span><span class="k">fallthrough</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="sc">&#39;\t&#39;</span><span class="p">:</span> <span class="w"> </span><span class="k">fallthrough</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="sc">&#39; &#39;</span><span class="p">:</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">cur</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Syntax that should be kept</span> <span class="w"> </span><span class="nx">symbols</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="nx">symbol</span><span class="p">{</span> <span class="w"> </span><span class="nx">commaSymbol</span><span class="p">,</span> <span class="w"> </span><span class="nx">leftParenSymbol</span><span class="p">,</span> <span class="w"> </span><span class="nx">rightParenSymbol</span><span class="p">,</span> <span class="w"> </span><span class="nx">semicolonSymbol</span><span class="p">,</span> <span class="w"> </span><span class="nx">asteriskSymbol</span><span class="p">,</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">options</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">symbols</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">options</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">options</span><span class="p">,</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">s</span><span class="p">))</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Use `ic`, not `cur`</span> <span class="w"> </span><span class="nx">match</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">longestMatch</span><span class="p">(</span><span class="nx">source</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="p">,</span><span class="w"> </span><span class="nx">options</span><span class="p">)</span> <span class="w"> </span><span class="c1">// Unknown character</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">match</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">ic</span><span class="p">.</span><span class="nx">pointer</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">match</span><span class="p">))</span> <span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">loc</span><span class="p">.</span><span class="nx">col</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">ic</span><span class="p">.</span><span class="nx">loc</span><span class="p">.</span><span class="nx">col</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">match</span><span class="p">))</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">token</span><span class="p">{</span> <span class="w"> </span><span class="nx">value</span><span class="p">:</span><span class="w"> </span><span class="nx">match</span><span class="p">,</span> <span class="w"> </span><span class="nx">loc</span><span class="p">:</span><span class="w"> </span><span class="nx">ic</span><span class="p">.</span><span class="nx">loc</span><span class="p">,</span> <span class="w"> </span><span class="nx">kind</span><span class="p">:</span><span class="w"> </span><span class="nx">symbolKind</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="nx">cur</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span> <span class="p">}</span> </pre></div> <p>Keywords are even simpler, and use the same <code>longestMatch</code> helper.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">lexKeyword</span><span class="p">(</span><span class="nx">source</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="w"> </span><span class="nx">cursor</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">cur</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">ic</span> <span class="w"> </span><span class="nx">keywords</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="nx">keyword</span><span class="p">{</span> <span class="w"> </span><span class="nx">selectKeyword</span><span class="p">,</span> <span class="w"> </span><span class="nx">insertKeyword</span><span class="p">,</span> <span class="w"> </span><span class="nx">valuesKeyword</span><span class="p">,</span> <span class="w"> </span><span class="nx">tableKeyword</span><span class="p">,</span> <span class="w"> </span><span class="nx">createKeyword</span><span class="p">,</span> <span class="w"> </span><span class="nx">whereKeyword</span><span class="p">,</span> <span class="w"> </span><span class="nx">fromKeyword</span><span class="p">,</span> <span class="w"> </span><span class="nx">intoKeyword</span><span class="p">,</span> <span class="w"> </span><span class="nx">textKeyword</span><span class="p">,</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">options</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">k</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">keywords</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">options</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">options</span><span class="p">,</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">k</span><span class="p">))</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">match</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">longestMatch</span><span class="p">(</span><span class="nx">source</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="p">,</span><span class="w"> </span><span class="nx">options</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">match</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">ic</span><span class="p">.</span><span class="nx">pointer</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">match</span><span class="p">))</span> <span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">loc</span><span class="p">.</span><span class="nx">col</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">ic</span><span class="p">.</span><span class="nx">loc</span><span class="p">.</span><span class="nx">col</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">match</span><span class="p">))</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">token</span><span class="p">{</span> <span class="w"> </span><span class="nx">value</span><span class="p">:</span><span class="w"> </span><span class="nx">match</span><span class="p">,</span> <span class="w"> </span><span class="nx">kind</span><span class="p">:</span><span class="w"> </span><span class="nx">kind</span><span class="p">,</span> <span class="w"> </span><span class="nx">loc</span><span class="p">:</span><span class="w"> </span><span class="nx">ic</span><span class="p">.</span><span class="nx">loc</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="nx">cur</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span> <span class="p">}</span> </pre></div> <p>And finally we implement the <code>longestMatch</code> helper:</p> <div class="highlight"><pre><span></span><span class="c1">// longestMatch iterates through a source string starting at the given</span> <span class="c1">// cursor to find the longest matching substring among the provided</span> <span class="c1">// options</span> <span class="kd">func</span><span class="w"> </span><span class="nx">longestMatch</span><span class="p">(</span><span class="nx">source</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">options</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">skipList</span><span class="w"> </span><span class="p">[]</span><span class="kt">int</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">match</span><span class="w"> </span><span class="kt">string</span> <span class="w"> </span><span class="nx">cur</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">ic</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">source</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">ToLower</span><span class="p">(</span><span class="nb">string</span><span class="p">(</span><span class="nx">source</span><span class="p">[</span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="p">]))</span><span class="o">...</span><span class="p">)</span> <span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="o">++</span> <span class="w"> </span><span class="nx">match</span><span class="p">:</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">option</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">options</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">skip</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">skipList</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">skip</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">continue</span><span class="w"> </span><span class="nx">match</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Deal with cases like INT vs INTO</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">option</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">value</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">skipList</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">skipList</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">option</span><span class="p">)</span><span class="w"> </span><span class="p">&gt;</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">match</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">match</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">option</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">continue</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">sharesPrefix</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">value</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">option</span><span class="p">[:</span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="o">-</span><span class="nx">ic</span><span class="p">.</span><span class="nx">pointer</span><span class="p">]</span> <span class="w"> </span><span class="nx">tooLong</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">value</span><span class="p">)</span><span class="w"> </span><span class="p">&gt;</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">option</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">tooLong</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="p">!</span><span class="nx">sharesPrefix</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">skipList</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">skipList</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">skipList</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">options</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">break</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">match</span> <span class="p">}</span> </pre></div> <h4 id="analyzing-identifiers">Analyzing identifiers</h4><p>An identifier is either a double-quoted string or a group of characters starting with an alphabetical character and possibly containing numbers and underscores.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">lexIdentifier</span><span class="p">(</span><span class="nx">source</span><span class="w"> </span><span class="kt">string</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="w"> </span><span class="nx">cursor</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Handle separately if is a double-quoted identifier</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">lexCharacterDelimited</span><span class="p">(</span><span class="nx">source</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="p">,</span><span class="w"> </span><span class="sc">&#39;&quot;&#39;</span><span class="p">);</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">cur</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">ic</span> <span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">source</span><span class="p">[</span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="p">]</span> <span class="w"> </span><span class="c1">// Other characters count too, big ignoring non-ascii for now</span> <span class="w"> </span><span class="nx">isAlphabetical</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="sc">&#39;A&#39;</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="sc">&#39;Z&#39;</span><span class="p">)</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="sc">&#39;a&#39;</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="sc">&#39;z&#39;</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">isAlphabetical</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="o">++</span> <span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">loc</span><span class="p">.</span><span class="nx">col</span><span class="o">++</span> <span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span><span class="p">{</span><span class="nx">c</span><span class="p">}</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">;</span><span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">source</span><span class="p">));</span><span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="o">++</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">source</span><span class="p">[</span><span class="nx">cur</span><span class="p">.</span><span class="nx">pointer</span><span class="p">]</span> <span class="w"> </span><span class="c1">// Other characters count too, big ignoring non-ascii for now</span> <span class="w"> </span><span class="nx">isAlphabetical</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="sc">&#39;A&#39;</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="sc">&#39;Z&#39;</span><span class="p">)</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="sc">&#39;a&#39;</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="sc">&#39;z&#39;</span><span class="p">)</span> <span class="w"> </span><span class="nx">isNumeric</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="sc">&#39;0&#39;</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="sc">&#39;9&#39;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">isAlphabetical</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">isNumeric</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;$&#39;</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="sc">&#39;_&#39;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="p">)</span> <span class="w"> </span><span class="nx">cur</span><span class="p">.</span><span class="nx">loc</span><span class="p">.</span><span class="nx">col</span><span class="o">++</span> <span class="w"> </span><span class="k">continue</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">break</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">value</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">ic</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">token</span><span class="p">{</span> <span class="w"> </span><span class="c1">// Unquoted dentifiers are case-insensitive</span> <span class="w"> </span><span class="nx">value</span><span class="p">:</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">ToLower</span><span class="p">(</span><span class="nb">string</span><span class="p">(</span><span class="nx">value</span><span class="p">)),</span> <span class="w"> </span><span class="nx">loc</span><span class="p">:</span><span class="w"> </span><span class="nx">ic</span><span class="p">.</span><span class="nx">loc</span><span class="p">,</span> <span class="w"> </span><span class="nx">kind</span><span class="p">:</span><span class="w"> </span><span class="nx">identifierKind</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="nx">cur</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span> <span class="p">}</span> </pre></div> <p>And that's it for the lexer! If you copy <a href="https://github.com/eatonphil/gosql/blob/master/lexer_test.go">lexer_test.go</a> from the main project, the tests should now pass.</p> <h3 id="ast-model">AST model</h3><p>At the highest level, an AST is a collection of statements:</p> <div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span> <span class="kd">type</span><span class="w"> </span><span class="nx">Ast</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">Statements</span><span class="w"> </span><span class="p">[]</span><span class="o">*</span><span class="nx">Statement</span> <span class="p">}</span> </pre></div> <p>A statement, for now, is one of <code>INSERT</code>, <code>CREATE</code>, or <code>SELECT</code>:</p> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">AstKind</span><span class="w"> </span><span class="kt">uint</span> <span class="kd">const</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="nx">SelectKind</span><span class="w"> </span><span class="nx">AstKind</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">iota</span> <span class="w"> </span><span class="nx">CreateTableKind</span> <span class="w"> </span><span class="nx">InsertKind</span> <span class="p">)</span> <span class="kd">type</span><span class="w"> </span><span class="nx">Statement</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">SelectStatement</span><span class="w"> </span><span class="o">*</span><span class="nx">SelectStatement</span> <span class="w"> </span><span class="nx">CreateTableStatement</span><span class="w"> </span><span class="o">*</span><span class="nx">CreateTableStatement</span> <span class="w"> </span><span class="nx">InsertStatement</span><span class="w"> </span><span class="o">*</span><span class="nx">InsertStatement</span> <span class="w"> </span><span class="nx">Kind</span><span class="w"> </span><span class="nx">AstKind</span> <span class="p">}</span> </pre></div> <h4 id="insert">INSERT</h4><p>An insert statement, for now, has a table name and a list of values to insert:</p> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">InsertStatement</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">table</span><span class="w"> </span><span class="nx">token</span> <span class="w"> </span><span class="nx">values</span><span class="w"> </span><span class="o">*</span><span class="p">[]</span><span class="o">*</span><span class="nx">expression</span> <span class="p">}</span> </pre></div> <p>An expression is a literal token or (in the future) a function call or inline operation:</p> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">expressionKind</span><span class="w"> </span><span class="kt">uint</span> <span class="kd">const</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="nx">literalKind</span><span class="w"> </span><span class="nx">expressionKind</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">iota</span> <span class="p">)</span> <span class="kd">type</span><span class="w"> </span><span class="nx">expression</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">literal</span><span class="w"> </span><span class="o">*</span><span class="nx">token</span> <span class="w"> </span><span class="nx">kind</span><span class="w"> </span><span class="nx">expressionKind</span> <span class="p">}</span> </pre></div> <h4 id="create">CREATE</h4><p>A create statement, for now, has a table name and a list of column names and types:</p> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">columnDefinition</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">name</span><span class="w"> </span><span class="nx">token</span> <span class="w"> </span><span class="nx">datatype</span><span class="w"> </span><span class="nx">token</span> <span class="p">}</span> <span class="kd">type</span><span class="w"> </span><span class="nx">CreateTableStatement</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">name</span><span class="w"> </span><span class="nx">token</span> <span class="w"> </span><span class="nx">cols</span><span class="w"> </span><span class="o">*</span><span class="p">[]</span><span class="o">*</span><span class="nx">columnDefinition</span> <span class="p">}</span> </pre></div> <h4 id="select">SELECT</h4><p>A select statement, for now, has a table name and a list of column names:</p> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">SelectStatement</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">item</span><span class="w"> </span><span class="p">[]</span><span class="o">*</span><span class="nx">expression</span> <span class="w"> </span><span class="nx">from</span><span class="w"> </span><span class="nx">token</span> <span class="p">}</span> </pre></div> <p>And that's it for the AST.</p> <h3 id="parsing">Parsing</h3><p>The <code>Parse</code> entrypoint will take a list of tokens and attempt to parse statements, separated by a semi-colon, until it reaches the last token.</p> <p>In general our strategy will be to increment and pass around a cursor containing the current position of unparsed tokens. Each helper will return the new cursor that the caller should start from.</p> <div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span> <span class="kn">import</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="s">&quot;errors&quot;</span> <span class="w"> </span><span class="s">&quot;fmt&quot;</span> <span class="p">)</span> <span class="kd">func</span><span class="w"> </span><span class="nx">tokenFromKeyword</span><span class="p">(</span><span class="nx">k</span><span class="w"> </span><span class="nx">keyword</span><span class="p">)</span><span class="w"> </span><span class="nx">token</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">token</span><span class="p">{</span> <span class="w"> </span><span class="nx">kind</span><span class="p">:</span><span class="w"> </span><span class="nx">keywordKind</span><span class="p">,</span> <span class="w"> </span><span class="nx">value</span><span class="p">:</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">k</span><span class="p">),</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">s</span><span class="w"> </span><span class="nx">symbol</span><span class="p">)</span><span class="w"> </span><span class="nx">token</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">token</span><span class="p">{</span> <span class="w"> </span><span class="nx">kind</span><span class="p">:</span><span class="w"> </span><span class="nx">symbolKind</span><span class="p">,</span> <span class="w"> </span><span class="nx">value</span><span class="p">:</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">s</span><span class="p">),</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">expectToken</span><span class="p">(</span><span class="nx">tokens</span><span class="w"> </span><span class="p">[]</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="nx">t</span><span class="w"> </span><span class="nx">token</span><span class="p">)</span><span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">tokens</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">equals</span><span class="p">(</span><span class="nx">tokens</span><span class="p">[</span><span class="nx">cursor</span><span class="p">])</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="w"> </span><span class="p">[]</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="nx">msg</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">*</span><span class="nx">token</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">tokens</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">tokens</span><span class="p">[</span><span class="nx">cursor</span><span class="p">]</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">tokens</span><span class="p">[</span><span class="nx">cursor</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">&quot;[%d,%d]: %s, got: %s\n&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">loc</span><span class="p">.</span><span class="nx">line</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">loc</span><span class="p">.</span><span class="nx">col</span><span class="p">,</span><span class="w"> </span><span class="nx">msg</span><span class="p">,</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">value</span><span class="p">)</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">Parse</span><span class="p">(</span><span class="nx">source</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">Ast</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">lex</span><span class="p">(</span><span class="nx">source</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">a</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">Ast</span><span class="p">{}</span> <span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="mi">0</span><span class="p">)</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">tokens</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">stmt</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parseStatement</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">semicolonSymbol</span><span class="p">))</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected statement&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">errors</span><span class="p">.</span><span class="nx">New</span><span class="p">(</span><span class="s">&quot;Failed to parse, expected statement&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">newCursor</span> <span class="w"> </span><span class="nx">a</span><span class="p">.</span><span class="nx">Statements</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">a</span><span class="p">.</span><span class="nx">Statements</span><span class="p">,</span><span class="w"> </span><span class="nx">stmt</span><span class="p">)</span> <span class="w"> </span><span class="nx">atLeastOneSemicolon</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">expectToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">semicolonSymbol</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">cursor</span><span class="o">++</span> <span class="w"> </span><span class="nx">atLeastOneSemicolon</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">true</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">atLeastOneSemicolon</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected semi-colon delimiter between statements&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">errors</span><span class="p">.</span><span class="nx">New</span><span class="p">(</span><span class="s">&quot;Missing semi-colon between statements&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">a</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <h4 id="parsing-statements">Parsing statements</h4><p>Each statement will be one of <code>INSERT</code>, <code>CREATE</code>, or <code>SELECT</code>. The <code>parseStatement</code> helper will call a helper on each of these statement types and return <code>true</code> if one of them succeeds in parsing.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">parseStatement</span><span class="p">(</span><span class="nx">tokens</span><span class="w"> </span><span class="p">[]</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="nx">delimiter</span><span class="w"> </span><span class="nx">token</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">Statement</span><span class="p">,</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">initialCursor</span> <span class="w"> </span><span class="c1">// Look for a SELECT statement</span> <span class="w"> </span><span class="nx">semicolonToken</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">semicolonSymbol</span><span class="p">)</span> <span class="w"> </span><span class="nx">slct</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parseSelectStatement</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">semicolonToken</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">Statement</span><span class="p">{</span> <span class="w"> </span><span class="nx">Kind</span><span class="p">:</span><span class="w"> </span><span class="nx">SelectKind</span><span class="p">,</span> <span class="w"> </span><span class="nx">SelectStatement</span><span class="p">:</span><span class="w"> </span><span class="nx">slct</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Look for a INSERT statement</span> <span class="w"> </span><span class="nx">inst</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parseInsertStatement</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">semicolonToken</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">Statement</span><span class="p">{</span> <span class="w"> </span><span class="nx">Kind</span><span class="p">:</span><span class="w"> </span><span class="nx">InsertKind</span><span class="p">,</span> <span class="w"> </span><span class="nx">InsertStatement</span><span class="p">:</span><span class="w"> </span><span class="nx">inst</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Look for a CREATE statement</span> <span class="w"> </span><span class="nx">crtTbl</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parseCreateTableStatement</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">semicolonToken</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">Statement</span><span class="p">{</span> <span class="w"> </span><span class="nx">Kind</span><span class="p">:</span><span class="w"> </span><span class="nx">CreateTableKind</span><span class="p">,</span> <span class="w"> </span><span class="nx">CreateTableStatement</span><span class="p">:</span><span class="w"> </span><span class="nx">crtTbl</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="p">}</span> </pre></div> <h4 id="parsing-select-statements">Parsing select statements</h4><p>Parsing <code>SELECT</code> statements is easy. We'll look for the following token pattern:</p> <ol> <li><code>SELECT</code></li> <li><code>$expression [, ...]</code></li> <li><code>FROM</code></li> <li><code>$table-name</code></li> </ol> <p>Sketching that out we get:</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">parseSelectStatement</span><span class="p">(</span><span class="nx">tokens</span><span class="w"> </span><span class="p">[]</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="nx">delimiter</span><span class="w"> </span><span class="nx">token</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">SelectStatement</span><span class="p">,</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">initialCursor</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">expectToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">tokenFromKeyword</span><span class="p">(</span><span class="nx">selectKeyword</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">cursor</span><span class="o">++</span> <span class="w"> </span><span class="nx">slct</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">SelectStatement</span><span class="p">{}</span> <span class="w"> </span><span class="nx">exps</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parseExpressions</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="nx">token</span><span class="p">{</span><span class="nx">tokenFromKeyword</span><span class="p">(</span><span class="nx">fromKeyword</span><span class="p">),</span><span class="w"> </span><span class="nx">delimiter</span><span class="p">})</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">slct</span><span class="p">.</span><span class="nx">item</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="o">*</span><span class="nx">exps</span> <span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">newCursor</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">expectToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">tokenFromKeyword</span><span class="p">(</span><span class="nx">fromKeyword</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">cursor</span><span class="o">++</span> <span class="w"> </span><span class="nx">from</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parseToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">identifierKind</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected FROM token&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">slct</span><span class="p">.</span><span class="nx">from</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="o">*</span><span class="nx">from</span> <span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">newCursor</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">slct</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span> <span class="p">}</span> </pre></div> <p>The <code>parseToken</code> helper will look for a token of a particular token kind.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">parseToken</span><span class="p">(</span><span class="nx">tokens</span><span class="w"> </span><span class="p">[]</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="nx">kind</span><span class="w"> </span><span class="nx">tokenKind</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">initialCursor</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">tokens</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">current</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">tokens</span><span class="p">[</span><span class="nx">cursor</span><span class="p">]</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">current</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">kind</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">current</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="p">}</span> </pre></div> <p>The <code>parseExpressions</code> helper will look for tokens separated by a comma until a delimiter is found. It will use existing helpers plus <code>parseExpression</code>.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">parseExpressions</span><span class="p">(</span><span class="nx">tokens</span><span class="w"> </span><span class="p">[]</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="nx">delimiters</span><span class="w"> </span><span class="p">[]</span><span class="nx">token</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="p">[]</span><span class="o">*</span><span class="nx">expression</span><span class="p">,</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">initialCursor</span> <span class="w"> </span><span class="nx">exps</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="o">*</span><span class="nx">expression</span><span class="p">{}</span> <span class="nx">outer</span><span class="p">:</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">tokens</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Look for delimiter</span> <span class="w"> </span><span class="nx">current</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">tokens</span><span class="p">[</span><span class="nx">cursor</span><span class="p">]</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">delimiter</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">delimiters</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">delimiter</span><span class="p">.</span><span class="nx">equals</span><span class="p">(</span><span class="nx">current</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">break</span><span class="w"> </span><span class="nx">outer</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Look for comma</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">exps</span><span class="p">)</span><span class="w"> </span><span class="p">&gt;</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">expectToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">commaSymbol</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected comma&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">cursor</span><span class="o">++</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Look for expression</span> <span class="w"> </span><span class="nx">exp</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parseExpression</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">commaSymbol</span><span class="p">))</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected expression&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">newCursor</span> <span class="w"> </span><span class="nx">exps</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">exps</span><span class="p">,</span><span class="w"> </span><span class="nx">exp</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">exps</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span> <span class="p">}</span> </pre></div> <p>The <code>parseExpression</code> helper (for now) will look for a numeric, string, or identifier token.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">parseExpression</span><span class="p">(</span><span class="nx">tokens</span><span class="w"> </span><span class="p">[]</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="w"> </span><span class="nx">token</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">expression</span><span class="p">,</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">initialCursor</span> <span class="w"> </span><span class="nx">kinds</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="nx">tokenKind</span><span class="p">{</span><span class="nx">identifierKind</span><span class="p">,</span><span class="w"> </span><span class="nx">numericKind</span><span class="p">,</span><span class="w"> </span><span class="nx">stringKind</span><span class="p">}</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">kind</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">kinds</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">t</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parseToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">kind</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">expression</span><span class="p">{</span> <span class="w"> </span><span class="nx">literal</span><span class="p">:</span><span class="w"> </span><span class="nx">t</span><span class="p">,</span> <span class="w"> </span><span class="nx">kind</span><span class="p">:</span><span class="w"> </span><span class="nx">literalKind</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="p">}</span> </pre></div> <p>And that's it for parsing a <code>SELECT</code> statement!</p> <h4 id="parsing-insert-statements">Parsing insert statements</h4><p>We'll look for the following token pattern:</p> <ol> <li><code>INSERT</code></li> <li><code>INTO</code></li> <li><code>$table-name</code></li> <li><code>VALUES</code></li> <li><code>(</code></li> <li><code>$expression [, ...]</code></li> <li><code>)</code></li> </ol> <p>With the existing helpers, this is straightforward to sketch out:</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">parseInsertStatement</span><span class="p">(</span><span class="nx">tokens</span><span class="w"> </span><span class="p">[]</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="nx">delimiter</span><span class="w"> </span><span class="nx">token</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">InsertStatement</span><span class="p">,</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">initialCursor</span> <span class="w"> </span><span class="c1">// Look for INSERT</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">expectToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">tokenFromKeyword</span><span class="p">(</span><span class="nx">insertKeyword</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">cursor</span><span class="o">++</span> <span class="w"> </span><span class="c1">// Look for INTO</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">expectToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">tokenFromKeyword</span><span class="p">(</span><span class="nx">intoKeyword</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected into&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">cursor</span><span class="o">++</span> <span class="w"> </span><span class="c1">// Look for table name</span> <span class="w"> </span><span class="nx">table</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parseToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">identifierKind</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected table name&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">newCursor</span> <span class="w"> </span><span class="c1">// Look for VALUES</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">expectToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">tokenFromKeyword</span><span class="p">(</span><span class="nx">valuesKeyword</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected VALUES&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">cursor</span><span class="o">++</span> <span class="w"> </span><span class="c1">// Look for left paren</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">expectToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">leftparenSymbol</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected left paren&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">cursor</span><span class="o">++</span> <span class="w"> </span><span class="c1">// Look for expression list</span> <span class="w"> </span><span class="nx">values</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parseExpressions</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="nx">token</span><span class="p">{</span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">rightparenSymbol</span><span class="p">)})</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">newCursor</span> <span class="w"> </span><span class="c1">// Look for right paren</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">expectToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">rightparenSymbol</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected right paren&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">cursor</span><span class="o">++</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">InsertStatement</span><span class="p">{</span> <span class="w"> </span><span class="nx">table</span><span class="p">:</span><span class="w"> </span><span class="o">*</span><span class="nx">table</span><span class="p">,</span> <span class="w"> </span><span class="nx">values</span><span class="p">:</span><span class="w"> </span><span class="nx">values</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span> <span class="p">}</span> </pre></div> <p>And that's it for parsing an <code>INSERT</code> statement!</p> <h4 id="parsing-create-statements">Parsing create statements</h4><p>Finally, for create statements we'll look for the following token pattern:</p> <ol> <li><code>CREATE</code></li> <li><code>$table-name</code></li> <li><code>(</code></li> <li><code>[$column-name $column-type [, ...]]</code></li> <li><code>)</code></li> </ol> <p>Sketching that out with a new <code>parseColumnDefinitions</code> helper we get:</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">parseCreateTableStatement</span><span class="p">(</span><span class="nx">tokens</span><span class="w"> </span><span class="p">[]</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="nx">delimiter</span><span class="w"> </span><span class="nx">token</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">CreateTableStatement</span><span class="p">,</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">initialCursor</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">expectToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">tokenFromKeyword</span><span class="p">(</span><span class="nx">createKeyword</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">cursor</span><span class="o">++</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">expectToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">tokenFromKeyword</span><span class="p">(</span><span class="nx">tableKeyword</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">cursor</span><span class="o">++</span> <span class="w"> </span><span class="nx">name</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parseToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">identifierKind</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected table name&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">newCursor</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">expectToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">leftparenSymbol</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected left parenthesis&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">cursor</span><span class="o">++</span> <span class="w"> </span><span class="nx">cols</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parseColumnDefinitions</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">rightparenSymbol</span><span class="p">))</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">newCursor</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">expectToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">rightparenSymbol</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected right parenthesis&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">cursor</span><span class="o">++</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">CreateTableStatement</span><span class="p">{</span> <span class="w"> </span><span class="nx">name</span><span class="p">:</span><span class="w"> </span><span class="o">*</span><span class="nx">name</span><span class="p">,</span> <span class="w"> </span><span class="nx">cols</span><span class="p">:</span><span class="w"> </span><span class="nx">cols</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span> <span class="p">}</span> </pre></div> <p>The <code>parseColumnDefinitions</code> helper will look column names followed by column types separated by a comma and ending with some delimiter:</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">parseColumnDefinitions</span><span class="p">(</span><span class="nx">tokens</span><span class="w"> </span><span class="p">[]</span><span class="o">*</span><span class="nx">token</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="nx">delimiter</span><span class="w"> </span><span class="nx">token</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="p">[]</span><span class="o">*</span><span class="nx">columnDefinition</span><span class="p">,</span><span class="w"> </span><span class="kt">uint</span><span class="p">,</span><span class="w"> </span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">initialCursor</span> <span class="w"> </span><span class="nx">cds</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="o">*</span><span class="nx">columnDefinition</span><span class="p">{}</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="nb">uint</span><span class="p">(</span><span class="nb">len</span><span class="p">(</span><span class="nx">tokens</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Look for a delimiter</span> <span class="w"> </span><span class="nx">current</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">tokens</span><span class="p">[</span><span class="nx">cursor</span><span class="p">]</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">delimiter</span><span class="p">.</span><span class="nx">equals</span><span class="p">(</span><span class="nx">current</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">break</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Look for a comma</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">cds</span><span class="p">)</span><span class="w"> </span><span class="p">&gt;</span><span class="w"> </span><span class="mi">0</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">expectToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">tokenFromSymbol</span><span class="p">(</span><span class="nx">commaSymbol</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected comma&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">cursor</span><span class="o">++</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Look for a column name</span> <span class="w"> </span><span class="nx">id</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parseToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">identifierKind</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected column name&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">newCursor</span> <span class="w"> </span><span class="c1">// Look for a column type</span> <span class="w"> </span><span class="nx">ty</span><span class="p">,</span><span class="w"> </span><span class="nx">newCursor</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parseToken</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="nx">keywordKind</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">helpMessage</span><span class="p">(</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Expected column type&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">initialCursor</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">cursor</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">newCursor</span> <span class="w"> </span><span class="nx">cds</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">cds</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">columnDefinition</span><span class="p">{</span> <span class="w"> </span><span class="nx">name</span><span class="p">:</span><span class="w"> </span><span class="o">*</span><span class="nx">id</span><span class="p">,</span> <span class="w"> </span><span class="nx">datatype</span><span class="p">:</span><span class="w"> </span><span class="o">*</span><span class="nx">ty</span><span class="p">,</span> <span class="w"> </span><span class="p">})</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">cds</span><span class="p">,</span><span class="w"> </span><span class="nx">cursor</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span> <span class="p">}</span> </pre></div> <p>And that's it for parsing! If you copy <a href="https://github.com/eatonphil/gosql/blob/master/parser_test.go">parser_test.go</a> from the main project, the tests should now pass.</p> <h3 id="an-in-memory-backend">An in-memory backend</h3><p>Our in-memory backend should conform to a general backend interface that allows a user to create, select, and insert data:</p> <div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span> <span class="kn">import</span><span class="w"> </span><span class="s">&quot;errors&quot;</span> <span class="kd">type</span><span class="w"> </span><span class="nx">ColumnType</span><span class="w"> </span><span class="kt">uint</span> <span class="kd">const</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="nx">TextType</span><span class="w"> </span><span class="nx">ColumnType</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">iota</span> <span class="w"> </span><span class="nx">IntType</span> <span class="p">)</span> <span class="kd">type</span><span class="w"> </span><span class="nx">Cell</span><span class="w"> </span><span class="kd">interface</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">AsText</span><span class="p">()</span><span class="w"> </span><span class="kt">string</span> <span class="w"> </span><span class="nx">AsInt</span><span class="p">()</span><span class="w"> </span><span class="kt">int32</span> <span class="p">}</span> <span class="kd">type</span><span class="w"> </span><span class="nx">Results</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">Columns</span><span class="w"> </span><span class="p">[]</span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">Type</span><span class="w"> </span><span class="nx">ColumnType</span> <span class="w"> </span><span class="nx">Name</span><span class="w"> </span><span class="kt">string</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">Rows</span><span class="w"> </span><span class="p">[][]</span><span class="nx">Cell</span> <span class="p">}</span> <span class="kd">var</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="nx">ErrTableDoesNotExist</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">errors</span><span class="p">.</span><span class="nx">New</span><span class="p">(</span><span class="s">&quot;Table does not exist&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">ErrColumnDoesNotExist</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">errors</span><span class="p">.</span><span class="nx">New</span><span class="p">(</span><span class="s">&quot;Column does not exist&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">ErrInvalidSelectItem</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">errors</span><span class="p">.</span><span class="nx">New</span><span class="p">(</span><span class="s">&quot;Select item is not valid&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">ErrInvalidDatatype</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">errors</span><span class="p">.</span><span class="nx">New</span><span class="p">(</span><span class="s">&quot;Invalid datatype&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">ErrMissingValues</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">errors</span><span class="p">.</span><span class="nx">New</span><span class="p">(</span><span class="s">&quot;Missing values&quot;</span><span class="p">)</span> <span class="p">)</span> <span class="kd">type</span><span class="w"> </span><span class="nx">Backend</span><span class="w"> </span><span class="kd">interface</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">CreateTable</span><span class="p">(</span><span class="o">*</span><span class="nx">CreateTableStatement</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span> <span class="w"> </span><span class="nx">Insert</span><span class="p">(</span><span class="o">*</span><span class="nx">InsertStatement</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span> <span class="w"> </span><span class="nx">Select</span><span class="p">(</span><span class="o">*</span><span class="nx">SelectStatement</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">Results</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span> <span class="p">}</span> </pre></div> <p>This leaves us room in the future for a disk-backed backend.</p> <h4 id="memory-layout">Memory layout</h4><p>Our in-memory backend should store a list of tables. Each table will have a list of columns and rows. Each column will have a name and type. Each row will have a list of byte arrays.</p> <div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span> <span class="kn">import</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="s">&quot;bytes&quot;</span> <span class="w"> </span><span class="s">&quot;encoding/binary&quot;</span> <span class="w"> </span><span class="s">&quot;fmt&quot;</span> <span class="w"> </span><span class="s">&quot;strconv&quot;</span> <span class="p">)</span> <span class="kd">type</span><span class="w"> </span><span class="nx">MemoryCell</span><span class="w"> </span><span class="p">[]</span><span class="kt">byte</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">mc</span><span class="w"> </span><span class="nx">MemoryCell</span><span class="p">)</span><span class="w"> </span><span class="nx">AsInt</span><span class="p">()</span><span class="w"> </span><span class="kt">int32</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="kt">int32</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">Read</span><span class="p">(</span><span class="nx">bytes</span><span class="p">.</span><span class="nx">NewBuffer</span><span class="p">(</span><span class="nx">mc</span><span class="p">),</span><span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">BigEndian</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">i</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">i</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">mc</span><span class="w"> </span><span class="nx">MemoryCell</span><span class="p">)</span><span class="w"> </span><span class="nx">AsText</span><span class="p">()</span><span class="w"> </span><span class="kt">string</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">string</span><span class="p">(</span><span class="nx">mc</span><span class="p">)</span> <span class="p">}</span> <span class="kd">type</span><span class="w"> </span><span class="nx">table</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">columns</span><span class="w"> </span><span class="p">[]</span><span class="kt">string</span> <span class="w"> </span><span class="nx">columnTypes</span><span class="w"> </span><span class="p">[]</span><span class="nx">ColumnType</span> <span class="w"> </span><span class="nx">rows</span><span class="w"> </span><span class="p">[][]</span><span class="nx">MemoryCell</span> <span class="p">}</span> <span class="kd">type</span><span class="w"> </span><span class="nx">MemoryBackend</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">tables</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="o">*</span><span class="nx">table</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">NewMemoryBackend</span><span class="p">()</span><span class="w"> </span><span class="o">*</span><span class="nx">MemoryBackend</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">MemoryBackend</span><span class="p">{</span> <span class="w"> </span><span class="nx">tables</span><span class="p">:</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="o">*</span><span class="nx">table</span><span class="p">{},</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <h4 id="implementing-create-table-support">Implementing create table support</h4><p>When creating a table, we'll make a new entry in the backend tables map. Then we'll create columns as specified by the AST.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">mb</span><span class="w"> </span><span class="o">*</span><span class="nx">MemoryBackend</span><span class="p">)</span><span class="w"> </span><span class="nx">CreateTable</span><span class="p">(</span><span class="nx">crt</span><span class="w"> </span><span class="o">*</span><span class="nx">CreateTableStatement</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">t</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">table</span><span class="p">{}</span> <span class="w"> </span><span class="nx">mb</span><span class="p">.</span><span class="nx">tables</span><span class="p">[</span><span class="nx">crt</span><span class="p">.</span><span class="nx">name</span><span class="p">.</span><span class="nx">value</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">t</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">crt</span><span class="p">.</span><span class="nx">cols</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">col</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="o">*</span><span class="nx">crt</span><span class="p">.</span><span class="nx">cols</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">columns</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">columns</span><span class="p">,</span><span class="w"> </span><span class="nx">col</span><span class="p">.</span><span class="nx">name</span><span class="p">.</span><span class="nx">value</span><span class="p">)</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">dt</span><span class="w"> </span><span class="nx">ColumnType</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">col</span><span class="p">.</span><span class="nx">datatype</span><span class="p">.</span><span class="nx">value</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s">&quot;int&quot;</span><span class="p">:</span> <span class="w"> </span><span class="nx">dt</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">IntType</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s">&quot;text&quot;</span><span class="p">:</span> <span class="w"> </span><span class="nx">dt</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">TextType</span> <span class="w"> </span><span class="k">default</span><span class="p">:</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">ErrInvalidDatatype</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">columnTypes</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">columnTypes</span><span class="p">,</span><span class="w"> </span><span class="nx">dt</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <h4 id="implementing-insert-support">Implementing insert support</h4><p>Keeping things simple, we'll assume the value passed can be correctly mapped to the type of the column specified.</p> <p>We'll reference a helper for mapper values to internal storage, <code>tokenToCell</code>.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">mb</span><span class="w"> </span><span class="o">*</span><span class="nx">MemoryBackend</span><span class="p">)</span><span class="w"> </span><span class="nx">Insert</span><span class="p">(</span><span class="nx">inst</span><span class="w"> </span><span class="o">*</span><span class="nx">InsertStatement</span><span class="p">)</span><span class="w"> </span><span class="kt">error</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">table</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">mb</span><span class="p">.</span><span class="nx">tables</span><span class="p">[</span><span class="nx">inst</span><span class="p">.</span><span class="nx">table</span><span class="p">.</span><span class="nx">value</span><span class="p">]</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">ErrTableDoesNotExist</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">inst</span><span class="p">.</span><span class="nx">values</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">row</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="nx">MemoryCell</span><span class="p">{}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="o">*</span><span class="nx">inst</span><span class="p">.</span><span class="nx">values</span><span class="p">)</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nb">len</span><span class="p">(</span><span class="nx">table</span><span class="p">.</span><span class="nx">columns</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">ErrMissingValues</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="o">*</span><span class="nx">inst</span><span class="p">.</span><span class="nx">values</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">value</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">literalKind</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="s">&quot;Skipping non-literal.&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">continue</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">row</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">row</span><span class="p">,</span><span class="w"> </span><span class="nx">mb</span><span class="p">.</span><span class="nx">tokenToCell</span><span class="p">(</span><span class="nx">value</span><span class="p">.</span><span class="nx">literal</span><span class="p">))</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">table</span><span class="p">.</span><span class="nx">rows</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">table</span><span class="p">.</span><span class="nx">rows</span><span class="p">,</span><span class="w"> </span><span class="nx">row</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <p>The <code>tokenToCell</code> helper will write numbers as binary bytes and will write strings as bytes:</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">mb</span><span class="w"> </span><span class="o">*</span><span class="nx">MemoryBackend</span><span class="p">)</span><span class="w"> </span><span class="nx">tokenToCell</span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="o">*</span><span class="nx">token</span><span class="p">)</span><span class="w"> </span><span class="nx">MemoryCell</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">numericKind</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">buf</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nb">new</span><span class="p">(</span><span class="nx">bytes</span><span class="p">.</span><span class="nx">Buffer</span><span class="p">)</span> <span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">strconv</span><span class="p">.</span><span class="nx">Atoi</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">value</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">Write</span><span class="p">(</span><span class="nx">buf</span><span class="p">,</span><span class="w"> </span><span class="nx">binary</span><span class="p">.</span><span class="nx">BigEndian</span><span class="p">,</span><span class="w"> </span><span class="nb">int32</span><span class="p">(</span><span class="nx">i</span><span class="p">))</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">MemoryCell</span><span class="p">(</span><span class="nx">buf</span><span class="p">.</span><span class="nx">Bytes</span><span class="p">())</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">stringKind</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">MemoryCell</span><span class="p">(</span><span class="nx">t</span><span class="p">.</span><span class="nx">value</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <h4 id="implementing-select-support">Implementing select support</h4><p>Finally, for select we'll iterate over each row in the table and return the cells according to the columns specified by the AST.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">mb</span><span class="w"> </span><span class="o">*</span><span class="nx">MemoryBackend</span><span class="p">)</span><span class="w"> </span><span class="nx">Select</span><span class="p">(</span><span class="nx">slct</span><span class="w"> </span><span class="o">*</span><span class="nx">SelectStatement</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="o">*</span><span class="nx">Results</span><span class="p">,</span><span class="w"> </span><span class="kt">error</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">table</span><span class="p">,</span><span class="w"> </span><span class="nx">ok</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">mb</span><span class="p">.</span><span class="nx">tables</span><span class="p">[</span><span class="nx">slct</span><span class="p">.</span><span class="nx">from</span><span class="p">.</span><span class="nx">table</span><span class="p">]</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">ok</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">ErrTableDoesNotExist</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">results</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[][]</span><span class="nx">Cell</span><span class="p">{}</span> <span class="w"> </span><span class="nx">columns</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">Type</span><span class="w"> </span><span class="nx">ColumnType</span> <span class="w"> </span><span class="nx">Name</span><span class="w"> </span><span class="kt">string</span> <span class="w"> </span><span class="p">}{}</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">row</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">table</span><span class="p">.</span><span class="nx">rows</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">result</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="nx">Cell</span><span class="p">{}</span> <span class="w"> </span><span class="nx">isFirstRow</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">exp</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">slct</span><span class="p">.</span><span class="nx">item</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">exp</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nx">literalKind</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Unsupported, doesn&#39;t currently exist, ignore.</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="s">&quot;Skipping non-literal expression.&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">continue</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">lit</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">exp</span><span class="p">.</span><span class="nx">literal</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">lit</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">identifierKind</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">found</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="kc">false</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">tableCol</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">table</span><span class="p">.</span><span class="nx">columns</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">tableCol</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">lit</span><span class="p">.</span><span class="nx">value</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">isFirstRow</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">columns</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">columns</span><span class="p">,</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">Type</span><span class="w"> </span><span class="nx">ColumnType</span> <span class="w"> </span><span class="nx">Name</span><span class="w"> </span><span class="kt">string</span> <span class="w"> </span><span class="p">}{</span> <span class="w"> </span><span class="nx">Type</span><span class="p">:</span><span class="w"> </span><span class="nx">table</span><span class="p">.</span><span class="nx">columnTypes</span><span class="p">[</span><span class="nx">i</span><span class="p">],</span> <span class="w"> </span><span class="nx">Name</span><span class="p">:</span><span class="w"> </span><span class="nx">lit</span><span class="p">.</span><span class="nx">value</span><span class="p">,</span> <span class="w"> </span><span class="p">})</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">result</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">result</span><span class="p">,</span><span class="w"> </span><span class="nx">row</span><span class="p">[</span><span class="nx">i</span><span class="p">])</span> <span class="w"> </span><span class="nx">found</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">true</span> <span class="w"> </span><span class="k">break</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">!</span><span class="nx">found</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">ErrColumnDoesNotExist</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">continue</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">ErrColumnDoesNotExist</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">results</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">results</span><span class="p">,</span><span class="w"> </span><span class="nx">result</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">Results</span><span class="p">{</span> <span class="w"> </span><span class="nx">Columns</span><span class="p">:</span><span class="w"> </span><span class="nx">columns</span><span class="p">,</span> <span class="w"> </span><span class="nx">Rows</span><span class="p">:</span><span class="w"> </span><span class="nx">results</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="kc">nil</span> <span class="p">}</span> </pre></div> <h3 id="the-repl">The REPL</h3><p>At last, we're ready to wrap the parser and in-memory backend in a REPL. The most complex part is displaying the table of results from a select query.</p> <div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span> <span class="kn">import</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="s">&quot;bufio&quot;</span> <span class="w"> </span><span class="s">&quot;fmt&quot;</span> <span class="w"> </span><span class="s">&quot;os&quot;</span> <span class="w"> </span><span class="s">&quot;strings&quot;</span> <span class="w"> </span><span class="s">&quot;github.com/eatonphil/gosql&quot;</span> <span class="p">)</span> <span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">mb</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">gosql</span><span class="p">.</span><span class="nx">NewMemoryBackend</span><span class="p">()</span> <span class="w"> </span><span class="nx">reader</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">bufio</span><span class="p">.</span><span class="nx">NewReader</span><span class="p">(</span><span class="nx">os</span><span class="p">.</span><span class="nx">Stdin</span><span class="p">)</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="s">&quot;Welcome to gosql.&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Print</span><span class="p">(</span><span class="s">&quot;# &quot;</span><span class="p">)</span> <span class="w"> </span><span class="nx">text</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">reader</span><span class="p">.</span><span class="nx">ReadString</span><span class="p">(</span><span class="sc">&#39;\n&#39;</span><span class="p">)</span> <span class="w"> </span><span class="nx">text</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">strings</span><span class="p">.</span><span class="nx">Replace</span><span class="p">(</span><span class="nx">text</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;\n&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;&quot;</span><span class="p">,</span><span class="w"> </span><span class="o">-</span><span class="mi">1</span><span class="p">)</span> <span class="w"> </span><span class="nx">ast</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">gosql</span><span class="p">.</span><span class="nx">Parse</span><span class="p">(</span><span class="nx">text</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">stmt</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">ast</span><span class="p">.</span><span class="nx">Statements</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">stmt</span><span class="p">.</span><span class="nx">Kind</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">gosql</span><span class="p">.</span><span class="nx">CreateTableKind</span><span class="p">:</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">mb</span><span class="p">.</span><span class="nx">CreateTable</span><span class="p">(</span><span class="nx">ast</span><span class="p">.</span><span class="nx">Statements</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="nx">CreateTableStatement</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="s">&quot;ok&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">gosql</span><span class="p">.</span><span class="nx">InsertKind</span><span class="p">:</span> <span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">mb</span><span class="p">.</span><span class="nx">Insert</span><span class="p">(</span><span class="nx">stmt</span><span class="p">.</span><span class="nx">InsertStatement</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="s">&quot;ok&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">gosql</span><span class="p">.</span><span class="nx">SelectKind</span><span class="p">:</span> <span class="w"> </span><span class="nx">results</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">mb</span><span class="p">.</span><span class="nx">Select</span><span class="p">(</span><span class="nx">stmt</span><span class="p">.</span><span class="nx">SelectStatement</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">panic</span><span class="p">(</span><span class="nx">err</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">col</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">results</span><span class="p">.</span><span class="nx">Columns</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">&quot;| %s &quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">col</span><span class="p">.</span><span class="nx">Name</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="s">&quot;|&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="p">&lt;</span><span class="w"> </span><span class="mi">20</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">&quot;=&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">()</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">result</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">results</span><span class="p">.</span><span class="nx">Rows</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">&quot;|&quot;</span><span class="p">)</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">cell</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">result</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">typ</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">results</span><span class="p">.</span><span class="nx">Columns</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">Type</span> <span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="s">&quot;&quot;</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">typ</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">gosql</span><span class="p">.</span><span class="nx">IntType</span><span class="p">:</span> <span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Sprintf</span><span class="p">(</span><span class="s">&quot;%d&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">cell</span><span class="p">.</span><span class="nx">AsInt</span><span class="p">())</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">gosql</span><span class="p">.</span><span class="nx">TextType</span><span class="p">:</span> <span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">cell</span><span class="p">.</span><span class="nx">AsText</span><span class="p">()</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Printf</span><span class="p">(</span><span class="s">&quot; %s | &quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">()</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">fmt</span><span class="p">.</span><span class="nx">Println</span><span class="p">(</span><span class="s">&quot;ok&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>Putting it all together:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>run<span class="w"> </span>*.go Welcome<span class="w"> </span>to<span class="w"> </span>gosql. <span class="c1"># CREATE TABLE users (id INT, name TEXT);</span> ok <span class="c1"># INSERT INTO users VALUES (1, &#39;Phil&#39;);</span> ok <span class="c1"># SELECT id, name FROM users;</span> <span class="p">|</span><span class="w"> </span>id<span class="w"> </span><span class="p">|</span><span class="w"> </span>name<span class="w"> </span><span class="p">|</span> <span class="o">====================</span> <span class="p">|</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="p">|</span><span class="w"> </span>Phil<span class="w"> </span><span class="p">|</span> ok <span class="c1"># INSERT INTO users VALUES (2, &#39;Kate&#39;);</span> ok <span class="c1"># SELECT name, id FROM users;</span> <span class="p">|</span><span class="w"> </span>name<span class="w"> </span><span class="p">|</span><span class="w"> </span>id<span class="w"> </span><span class="p">|</span> <span class="o">====================</span> <span class="p">|</span><span class="w"> </span>Phil<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="p">|</span> <span class="p">|</span><span class="w"> </span>Kate<span class="w"> </span><span class="p">|</span><span class="w"> </span><span class="m">2</span><span class="w"> </span><span class="p">|</span> ok </pre></div> <p>And we've got a very simple SQL database!</p> <p>Next up we'll get into filtering, sorting, and indexing.</p> <h4 id="further-reading">Further reading</h4><ul> <li><a href="/writing-a-simple-json-parser.html">Writing a simple JSON parser</a><ul> <li>This post goes into a little more detail about the theory and basics of parsing.</li> </ul> </li> <li><a href="https://www.goodreads.com/book/show/617120.Database_Systems">Database Systems: A Practical Approach to Design, Implementation and Management</a><ul> <li>A giant book, but an excellent and very easy introduction to database theory.</li> </ul> </li> </ul> <p><blockquote class="twitter-tweet" data-conversation="none"><p lang="en" dir="ltr">Latest blog post: writing a simple SQL database from scratch in Go <a href="https://t.co/csQmNhWIEf">https://t.co/csQmNhWIEf</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1237522975143776256?ref_src=twsrc%5Etfw">March 10, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/database-basics.htmlFri, 06 Mar 2020 00:00:00 +0000A minimal REST API in Javahttp://notes.eatonphil.com/a-minimal-rest-api-in-java.html<p>There's a style of Java that is a joy to write. This post will cover how to set up a basic PostgreSQL-integrated REST API using <a href="https://eclipse-ee4j.github.io/jersey/">Jersey</a> and <a href="https://www.jooq.org/">JOOQ</a> in a style not dissimilar to Flask and SQLAlchemy in Python.</p> <p>In particular, we'll try to avoid as much runtime reflection/class-loading as possible. This will make the application less flexible but easier to debug and understand.</p> <p>I'd appreciate pointers in email if you see anything weird or can fix any of my bugs.</p> <h3 id="dependencies">Dependencies</h3><p>Install <a href="https://maven.apache.org/">Maven</a>, a recent <a href="https://openjdk.java.net/">JDK</a>, and PostgreSQL.</p> <p>Copy the following into <code>pom.xml</code> to tell Maven about Java dependencies:</p> <div class="highlight"><pre><span></span><span class="nt">&lt;project</span><span class="w"> </span><span class="na">xmlns=</span><span class="s">&quot;http://maven.apache.org/POM/4.0.0&quot;</span><span class="w"> </span><span class="na">xmlns:xsi=</span><span class="s">&quot;http://www.w3.org/2001/XMLSchema-instance&quot;</span> <span class="w"> </span><span class="na">xsi:schemaLocation=</span><span class="s">&quot;http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd&quot;</span><span class="nt">&gt;</span> <span class="w"> </span><span class="nt">&lt;modelVersion&gt;</span>4.0.0<span class="nt">&lt;/modelVersion&gt;</span> <span class="w"> </span><span class="nt">&lt;groupId&gt;</span>api<span class="nt">&lt;/groupId&gt;</span> <span class="w"> </span><span class="nt">&lt;artifactId&gt;</span>api<span class="nt">&lt;/artifactId&gt;</span> <span class="w"> </span><span class="nt">&lt;version&gt;</span>1.0-SNAPSHOT<span class="nt">&lt;/version&gt;</span> <span class="w"> </span><span class="nt">&lt;properties&gt;</span> <span class="w"> </span><span class="nt">&lt;maven.compiler.source&gt;</span>13<span class="nt">&lt;/maven.compiler.source&gt;</span> <span class="w"> </span><span class="nt">&lt;maven.compiler.target&gt;</span>13<span class="nt">&lt;/maven.compiler.target&gt;</span> <span class="w"> </span><span class="nt">&lt;/properties&gt;</span> <span class="w"> </span><span class="nt">&lt;build&gt;</span> <span class="w"> </span><span class="nt">&lt;plugins&gt;</span> <span class="w"> </span><span class="nt">&lt;plugin&gt;</span> <span class="w"> </span><span class="nt">&lt;groupId&gt;</span>org.apache.maven.plugins<span class="nt">&lt;/groupId&gt;</span> <span class="w"> </span><span class="nt">&lt;artifactId&gt;</span>maven-compiler-plugin<span class="nt">&lt;/artifactId&gt;</span> <span class="w"> </span><span class="nt">&lt;version&gt;</span>3.8.1<span class="nt">&lt;/version&gt;</span> <span class="w"> </span><span class="nt">&lt;configuration&gt;</span> <span class="w"> </span><span class="nt">&lt;compilerArgs&gt;</span> <span class="w"> </span><span class="nt">&lt;arg&gt;</span>-Xlint:all,-options,-path<span class="nt">&lt;/arg&gt;</span> <span class="w"> </span><span class="nt">&lt;/compilerArgs&gt;</span> <span class="w"> </span><span class="nt">&lt;/configuration&gt;</span> <span class="w"> </span><span class="nt">&lt;/plugin&gt;</span> <span class="w"> </span><span class="nt">&lt;plugin&gt;</span> <span class="w"> </span><span class="nt">&lt;groupId&gt;</span>org.codehaus.mojo<span class="nt">&lt;/groupId&gt;</span> <span class="w"> </span><span class="nt">&lt;artifactId&gt;</span>exec-maven-plugin<span class="nt">&lt;/artifactId&gt;</span> <span class="w"> </span><span class="nt">&lt;version&gt;</span>1.6.0<span class="nt">&lt;/version&gt;</span> <span class="w"> </span><span class="nt">&lt;configuration&gt;</span> <span class="w"> </span><span class="nt">&lt;mainClass&gt;</span>api.Main<span class="nt">&lt;/mainClass&gt;</span> <span class="w"> </span><span class="nt">&lt;/configuration&gt;</span> <span class="w"> </span><span class="nt">&lt;/plugin&gt;</span> <span class="w"> </span><span class="nt">&lt;/plugins&gt;</span> <span class="w"> </span><span class="nt">&lt;/build&gt;</span> <span class="w"> </span><span class="nt">&lt;dependencies&gt;</span> <span class="w"> </span><span class="nt">&lt;dependency&gt;</span> <span class="w"> </span><span class="nt">&lt;groupId&gt;</span>org.glassfish.jersey.containers<span class="nt">&lt;/groupId&gt;</span> <span class="w"> </span><span class="nt">&lt;artifactId&gt;</span>jersey-container-jetty-http<span class="nt">&lt;/artifactId&gt;</span> <span class="w"> </span><span class="nt">&lt;version&gt;</span>2.30<span class="nt">&lt;/version&gt;</span> <span class="w"> </span><span class="nt">&lt;/dependency&gt;</span> <span class="w"> </span><span class="nt">&lt;dependency&gt;</span> <span class="w"> </span><span class="nt">&lt;groupId&gt;</span>org.jooq<span class="nt">&lt;/groupId&gt;</span> <span class="w"> </span><span class="nt">&lt;artifactId&gt;</span>jooq<span class="nt">&lt;/artifactId&gt;</span> <span class="w"> </span><span class="nt">&lt;version&gt;</span>3.12.3<span class="nt">&lt;/version&gt;</span> <span class="w"> </span><span class="nt">&lt;/dependency&gt;</span> <span class="w"> </span><span class="nt">&lt;dependency&gt;</span> <span class="w"> </span><span class="nt">&lt;groupId&gt;</span>org.jooq<span class="nt">&lt;/groupId&gt;</span> <span class="w"> </span><span class="nt">&lt;artifactId&gt;</span>jooq-meta<span class="nt">&lt;/artifactId&gt;</span> <span class="w"> </span><span class="nt">&lt;version&gt;</span>3.12.3<span class="nt">&lt;/version&gt;</span> <span class="w"> </span><span class="nt">&lt;/dependency&gt;</span> <span class="w"> </span><span class="nt">&lt;dependency&gt;</span> <span class="w"> </span><span class="nt">&lt;groupId&gt;</span>org.postgresql<span class="nt">&lt;/groupId&gt;</span> <span class="w"> </span><span class="nt">&lt;artifactId&gt;</span>postgresql<span class="nt">&lt;/artifactId&gt;</span> <span class="w"> </span><span class="nt">&lt;version&gt;</span>42.2.9<span class="nt">&lt;/version&gt;</span> <span class="w"> </span><span class="nt">&lt;/dependency&gt;</span> <span class="w"> </span><span class="nt">&lt;dependency&gt;</span> <span class="w"> </span><span class="nt">&lt;groupId&gt;</span>org.glassfish.jersey.inject<span class="nt">&lt;/groupId&gt;</span> <span class="w"> </span><span class="nt">&lt;artifactId&gt;</span>jersey-hk2<span class="nt">&lt;/artifactId&gt;</span> <span class="w"> </span><span class="nt">&lt;version&gt;</span>2.30<span class="nt">&lt;/version&gt;</span> <span class="w"> </span><span class="nt">&lt;/dependency&gt;</span> <span class="w"> </span><span class="nt">&lt;dependency&gt;</span> <span class="w"> </span><span class="nt">&lt;groupId&gt;</span>ch.qos.logback<span class="nt">&lt;/groupId&gt;</span> <span class="w"> </span><span class="nt">&lt;artifactId&gt;</span>logback-core<span class="nt">&lt;/artifactId&gt;</span> <span class="w"> </span><span class="nt">&lt;version&gt;</span>1.2.3<span class="nt">&lt;/version&gt;</span> <span class="w"> </span><span class="nt">&lt;/dependency&gt;</span> <span class="w"> </span><span class="nt">&lt;dependency&gt;</span> <span class="w"> </span><span class="nt">&lt;groupId&gt;</span>org.slf4j<span class="nt">&lt;/groupId&gt;</span> <span class="w"> </span><span class="nt">&lt;artifactId&gt;</span>slf4j-api<span class="nt">&lt;/artifactId&gt;</span> <span class="w"> </span><span class="nt">&lt;version&gt;</span>1.7.30<span class="nt">&lt;/version&gt;</span> <span class="w"> </span><span class="nt">&lt;/dependency&gt;</span> <span class="w"> </span><span class="nt">&lt;dependency&gt;</span> <span class="w"> </span><span class="nt">&lt;groupId&gt;</span>ch.qos.logback<span class="nt">&lt;/groupId&gt;</span> <span class="w"> </span><span class="nt">&lt;artifactId&gt;</span>logback-classic<span class="nt">&lt;/artifactId&gt;</span> <span class="w"> </span><span class="nt">&lt;version&gt;</span>1.2.3<span class="nt">&lt;/version&gt;</span> <span class="w"> </span><span class="nt">&lt;/dependency&gt;</span> <span class="w"> </span><span class="nt">&lt;dependency&gt;</span> <span class="w"> </span><span class="nt">&lt;groupId&gt;</span>org.glassfish.jersey.media<span class="nt">&lt;/groupId&gt;</span> <span class="w"> </span><span class="nt">&lt;artifactId&gt;</span>jersey-media-json-jackson<span class="nt">&lt;/artifactId&gt;</span> <span class="w"> </span><span class="nt">&lt;version&gt;</span>2.30<span class="nt">&lt;/version&gt;</span> <span class="w"> </span><span class="nt">&lt;/dependency&gt;</span> <span class="w"> </span><span class="nt">&lt;dependency&gt;</span> <span class="w"> </span><span class="nt">&lt;groupId&gt;</span>javax.persistence<span class="nt">&lt;/groupId&gt;</span> <span class="w"> </span><span class="nt">&lt;artifactId&gt;</span>javax.persistence-api<span class="nt">&lt;/artifactId&gt;</span> <span class="w"> </span><span class="nt">&lt;version&gt;</span>2.2<span class="nt">&lt;/version&gt;</span> <span class="w"> </span><span class="nt">&lt;/dependency&gt;</span> <span class="w"> </span><span class="nt">&lt;dependency&gt;</span> <span class="w"> </span><span class="nt">&lt;groupId&gt;</span>org.projectlombok<span class="nt">&lt;/groupId&gt;</span> <span class="w"> </span><span class="nt">&lt;artifactId&gt;</span>lombok<span class="nt">&lt;/artifactId&gt;</span> <span class="w"> </span><span class="nt">&lt;version&gt;</span>1.18.10<span class="nt">&lt;/version&gt;</span> <span class="w"> </span><span class="nt">&lt;scope&gt;</span>provided<span class="nt">&lt;/scope&gt;</span> <span class="w"> </span><span class="nt">&lt;/dependency&gt;</span> <span class="w"> </span><span class="nt">&lt;dependency&gt;</span> <span class="w"> </span><span class="nt">&lt;groupId&gt;</span>com.fasterxml.jackson<span class="nt">&lt;/groupId&gt;</span> <span class="w"> </span><span class="nt">&lt;artifactId&gt;</span>jackson-bom<span class="nt">&lt;/artifactId&gt;</span> <span class="w"> </span><span class="nt">&lt;version&gt;</span>2.10.2<span class="nt">&lt;/version&gt;</span> <span class="w"> </span><span class="nt">&lt;type&gt;</span>pom<span class="nt">&lt;/type&gt;</span> <span class="w"> </span><span class="nt">&lt;/dependency&gt;</span> <span class="w"> </span><span class="nt">&lt;/dependencies&gt;</span> <span class="nt">&lt;/project&gt;</span> </pre></div> <p>Now run <code>mvn install</code> to download and configure all dependencies.</p> <h3 id="project-setup">Project setup</h3><p>The <code>Main</code> class will be our entrypoint within <code>src/main/java/api/Main.java</code>.</p> <p>It will handle loading configuration, setting up the application server, and starting it.</p> <div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nn">api</span><span class="p">;</span> <span class="kn">import</span><span class="w"> </span><span class="nn">java.io.InputStream</span><span class="p">;</span> <span class="kn">import</span><span class="w"> </span><span class="nn">api.app.Application</span><span class="p">;</span> <span class="kn">import</span><span class="w"> </span><span class="nn">api.app.Config</span><span class="p">;</span> <span class="kd">public</span><span class="w"> </span><span class="kd">class</span> <span class="nc">Main</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">public</span><span class="w"> </span><span class="kd">static</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">main</span><span class="p">(</span><span class="n">String</span><span class="o">[]</span><span class="w"> </span><span class="n">args</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="n">cfg</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">Config</span><span class="p">();</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="n">server</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">Application</span><span class="p">(</span><span class="n">cfg</span><span class="p">);</span> <span class="w"> </span><span class="n">server</span><span class="p">.</span><span class="na">start</span><span class="p">();</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="p">(</span><span class="n">Exception</span><span class="w"> </span><span class="n">e</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">e</span><span class="p">.</span><span class="na">printStackTrace</span><span class="p">();</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>The <code>Config</code> class in <code>src/main/java/api/app/Config.java</code> will contain a few hard-coded settings for now. In the future it could be read from a file.</p> <div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nn">api.app</span><span class="p">;</span> <span class="kn">import</span><span class="w"> </span><span class="nn">java.io.InputStream</span><span class="p">;</span> <span class="kn">import</span><span class="w"> </span><span class="nn">java.time.Duration</span><span class="p">;</span> <span class="kn">import</span><span class="w"> </span><span class="nn">java.util.Properties</span><span class="p">;</span> <span class="kd">public</span><span class="w"> </span><span class="kd">final</span><span class="w"> </span><span class="kd">class</span> <span class="nc">Config</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">public</span><span class="w"> </span><span class="kd">final</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="n">server_address</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;http://localhost&quot;</span><span class="p">;</span> <span class="w"> </span><span class="kd">public</span><span class="w"> </span><span class="kd">final</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">server_port</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">7780</span><span class="p">;</span> <span class="w"> </span><span class="kd">public</span><span class="w"> </span><span class="kd">final</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="n">db_connection</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;jdbc:postgresql://localhost/todo&quot;</span><span class="p">;</span> <span class="w"> </span><span class="kd">public</span><span class="w"> </span><span class="kd">final</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="n">db_username</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;todo&quot;</span><span class="p">;</span> <span class="w"> </span><span class="kd">public</span><span class="w"> </span><span class="kd">final</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="n">db_password</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;todo&quot;</span><span class="p">;</span> <span class="p">}</span> </pre></div> <p>And finally the <code>Application</code> class in <code>src/main/java/api/app/Application.java</code> will handle loading a PostgreSQL connection, registering the class path to look for Jersey routes/controllers, registering the PostgreSQL connection in the dependency injection controller and starting the Jersey controller.</p> <div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nn">api.app</span><span class="p">;</span> <span class="kn">import</span><span class="w"> </span><span class="nn">javax.ws.rs.core.UriBuilder</span><span class="p">;</span> <span class="kn">import</span><span class="w"> </span><span class="nn">org.glassfish.jersey.internal.inject.AbstractBinder</span><span class="p">;</span> <span class="kn">import</span><span class="w"> </span><span class="nn">org.glassfish.jersey.jetty.JettyHttpContainerFactory</span><span class="p">;</span> <span class="kn">import</span><span class="w"> </span><span class="nn">org.glassfish.jersey.server.ResourceConfig</span><span class="p">;</span> <span class="kn">import</span><span class="w"> </span><span class="nn">org.slf4j.LoggerFactory</span><span class="p">;</span> <span class="kn">import</span><span class="w"> </span><span class="nn">api.dao.Dao</span><span class="p">;</span> <span class="kn">import</span><span class="w"> </span><span class="nn">ch.qos.logback.classic.Level</span><span class="p">;</span> <span class="kn">import</span><span class="w"> </span><span class="nn">ch.qos.logback.classic.Logger</span><span class="p">;</span> <span class="kd">public</span><span class="w"> </span><span class="kd">class</span> <span class="nc">Application</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="kd">static</span><span class="w"> </span><span class="kd">final</span><span class="w"> </span><span class="n">Logger</span><span class="w"> </span><span class="n">logger</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">Logger</span><span class="p">)</span><span class="w"> </span><span class="n">LoggerFactory</span><span class="p">.</span><span class="na">getLogger</span><span class="p">(</span><span class="n">Application</span><span class="p">.</span><span class="na">class</span><span class="p">);</span> <span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="kd">static</span><span class="w"> </span><span class="kd">final</span><span class="w"> </span><span class="n">Logger</span><span class="w"> </span><span class="n">rootLogger</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">Logger</span><span class="p">)</span><span class="w"> </span><span class="n">LoggerFactory</span><span class="p">.</span><span class="na">getLogger</span><span class="p">(</span><span class="n">Logger</span><span class="p">.</span><span class="na">ROOT_LOGGER_NAME</span><span class="p">);</span> <span class="w"> </span><span class="kd">static</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">rootLogger</span><span class="p">.</span><span class="na">setLevel</span><span class="p">(</span><span class="n">Level</span><span class="p">.</span><span class="na">INFO</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">Config</span><span class="w"> </span><span class="n">cfg</span><span class="p">;</span> <span class="w"> </span><span class="kd">public</span><span class="w"> </span><span class="nf">Application</span><span class="p">(</span><span class="kd">final</span><span class="w"> </span><span class="n">Config</span><span class="w"> </span><span class="n">_cfg</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">cfg</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">_cfg</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">addShutdownHook</span><span class="p">(</span><span class="kd">final</span><span class="w"> </span><span class="n">Runnable</span><span class="w"> </span><span class="n">hook</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">Runtime</span><span class="p">.</span><span class="na">getRuntime</span><span class="p">().</span><span class="na">addShutdownHook</span><span class="p">(</span><span class="k">new</span><span class="w"> </span><span class="n">Thread</span><span class="p">(</span><span class="n">hook</span><span class="p">));</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">public</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">start</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="n">dao</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">Dao</span><span class="p">(</span><span class="n">cfg</span><span class="p">.</span><span class="na">db_connection</span><span class="p">,</span><span class="w"> </span><span class="n">cfg</span><span class="p">.</span><span class="na">db_username</span><span class="p">,</span><span class="w"> </span><span class="n">cfg</span><span class="p">.</span><span class="na">db_password</span><span class="p">);</span> <span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">dao</span><span class="p">.</span><span class="na">initialize</span><span class="p">();</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="p">(</span><span class="n">Exception</span><span class="w"> </span><span class="n">e</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">e</span><span class="p">.</span><span class="na">printStackTrace</span><span class="p">();</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">addShutdownHook</span><span class="p">(()</span><span class="w"> </span><span class="o">-&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">dao</span><span class="p">.</span><span class="na">close</span><span class="p">();</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="p">(</span><span class="n">java</span><span class="p">.</span><span class="na">sql</span><span class="p">.</span><span class="na">SQLException</span><span class="w"> </span><span class="n">e</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">e</span><span class="p">.</span><span class="na">printStackTrace</span><span class="p">();</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">});</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="n">resourceConfig</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">new</span><span class="w"> </span><span class="n">ResourceConfig</span><span class="p">();</span> <span class="w"> </span><span class="n">resourceConfig</span><span class="p">.</span><span class="na">packages</span><span class="p">(</span><span class="s">&quot;api.controller&quot;</span><span class="p">);</span> <span class="w"> </span><span class="n">resourceConfig</span><span class="p">.</span><span class="na">register</span><span class="p">(</span><span class="k">new</span><span class="w"> </span><span class="n">AbstractBinder</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nd">@Override</span> <span class="w"> </span><span class="kd">protected</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">configure</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">bind</span><span class="p">(</span><span class="n">dao</span><span class="p">).</span><span class="na">to</span><span class="p">(</span><span class="n">Dao</span><span class="p">.</span><span class="na">class</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">});</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="n">baseUri</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">UriBuilder</span><span class="p">.</span><span class="na">fromUri</span><span class="p">(</span><span class="n">cfg</span><span class="p">.</span><span class="na">server_address</span><span class="p">).</span><span class="na">port</span><span class="p">(</span><span class="n">cfg</span><span class="p">.</span><span class="na">server_port</span><span class="p">).</span><span class="na">build</span><span class="p">();</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="n">server</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">JettyHttpContainerFactory</span><span class="p">.</span><span class="na">createServer</span><span class="p">(</span><span class="n">baseUri</span><span class="p">,</span><span class="w"> </span><span class="n">resourceConfig</span><span class="p">);</span> <span class="w"> </span><span class="n">logger</span><span class="p">.</span><span class="na">info</span><span class="p">(</span><span class="s">&quot;Started listening on {}:{}&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">cfg</span><span class="p">.</span><span class="na">server_address</span><span class="p">,</span><span class="w"> </span><span class="n">cfg</span><span class="p">.</span><span class="na">server_port</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p class="note"> I couldn't figure out a reasonable way to avoid the class path registration for routes. <br /> <br /> It's also important to note that the <code>AbstractBinder</code> appears to search the class path implicitly for any available dependency injection controller. I'd rather we had specified it explicitly but I'm not sure how. It will succeed because we installed <a href="https://javaee.github.io/hk2/">HK2</a> as a dependency (see <code>pom.xml</code>). </p><p>With the <code>Application</code> code finished, we'll need to build out the referenced <code>Dao</code> and controller classes.</p> <h3 id="dao">Dao</h3><p>The <code>Dao</code> class in <code>src/main/java/api/dao/Dao.java</code> will enclose the connection to PostgreSQL via JOOQ.</p> <div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nn">api.dao</span><span class="p">;</span> <span class="kn">import</span><span class="w"> </span><span class="nn">java.sql.Connection</span><span class="p">;</span> <span class="kn">import</span><span class="w"> </span><span class="nn">java.sql.DriverManager</span><span class="p">;</span> <span class="kn">import</span><span class="w"> </span><span class="nn">java.sql.SQLException</span><span class="p">;</span> <span class="kn">import</span><span class="w"> </span><span class="nn">org.jooq.DSLContext</span><span class="p">;</span> <span class="kn">import</span><span class="w"> </span><span class="nn">org.jooq.SQLDialect</span><span class="p">;</span> <span class="kn">import</span><span class="w"> </span><span class="nn">org.jooq.impl.DSL</span><span class="p">;</span> <span class="kd">public</span><span class="w"> </span><span class="kd">class</span> <span class="nc">Dao</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="n">Connection</span><span class="w"> </span><span class="n">conn</span><span class="p">;</span> <span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="n">url</span><span class="p">;</span> <span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="n">username</span><span class="p">;</span> <span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="n">password</span><span class="p">;</span> <span class="w"> </span><span class="kd">public</span><span class="w"> </span><span class="nf">Dao</span><span class="p">(</span><span class="kd">final</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="n">_url</span><span class="p">,</span><span class="w"> </span><span class="kd">final</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="n">_username</span><span class="p">,</span><span class="w"> </span><span class="kd">final</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="n">_password</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">url</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">_url</span><span class="p">;</span> <span class="w"> </span><span class="n">username</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">_username</span><span class="p">;</span> <span class="w"> </span><span class="n">password</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">_password</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">public</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">initialize</span><span class="p">()</span><span class="w"> </span><span class="kd">throws</span><span class="w"> </span><span class="n">SQLException</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">conn</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">DriverManager</span><span class="p">.</span><span class="na">getConnection</span><span class="p">(</span><span class="n">url</span><span class="p">,</span><span class="w"> </span><span class="n">username</span><span class="p">,</span><span class="w"> </span><span class="n">password</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">public</span><span class="w"> </span><span class="kt">void</span><span class="w"> </span><span class="nf">close</span><span class="p">()</span><span class="w"> </span><span class="kd">throws</span><span class="w"> </span><span class="n">SQLException</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">conn</span><span class="p">.</span><span class="na">close</span><span class="p">();</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">public</span><span class="w"> </span><span class="n">DSLContext</span><span class="w"> </span><span class="nf">getDSLContext</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">DSL</span><span class="p">.</span><span class="na">using</span><span class="p">(</span><span class="n">conn</span><span class="p">,</span><span class="w"> </span><span class="n">SQLDialect</span><span class="p">.</span><span class="na">POSTGRES</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>And this will be enough to use in our controller. But let's take a moment to talk about the data model.</p> <h3 id="data">Data</h3><p>This API will return results from a TODO list. The database should store each TODO item and a timestamp of completion, if completed.</p> <p>We'll start by creating a database and user for the application:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>sudo<span class="w"> </span>su<span class="w"> </span>postgres postgres<span class="w"> </span>$<span class="w"> </span>psql <span class="nv">postgres</span><span class="o">=</span><span class="c1"># CREATE DATABASE todo;</span> <span class="nv">postgres</span><span class="o">=</span><span class="c1"># CREATE USER todo WITH PASSWORD &#39;todo&#39;;</span> <span class="nv">postgres</span><span class="o">=</span><span class="c1"># GRANT ALL ON DATABASE todo TO todo;</span> </pre></div> <p>Then we'll write an initial migration:</p> <div class="highlight"><pre><span></span><span class="err">$</span><span class="w"> </span><span class="n">cat</span><span class="w"> </span><span class="n">migrations</span><span class="o">/</span><span class="mi">1</span><span class="n">_init</span><span class="p">.</span><span class="k">sql</span> <span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">todo_item</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="n">id</span><span class="w"> </span><span class="n">BIGSERIAL</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span><span class="w"> </span><span class="k">PRIMARY</span><span class="w"> </span><span class="k">KEY</span><span class="p">,</span> <span class="w"> </span><span class="n">item</span><span class="w"> </span><span class="nb">TEXT</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span><span class="p">,</span> <span class="w"> </span><span class="n">created_at</span><span class="w"> </span><span class="n">TIMESTAMPTZ</span><span class="w"> </span><span class="k">NOT</span><span class="w"> </span><span class="k">NULL</span><span class="w"> </span><span class="k">DEFAULT</span><span class="w"> </span><span class="n">NOW</span><span class="p">(),</span> <span class="w"> </span><span class="n">completed_at</span><span class="w"> </span><span class="n">TIMESTAMPTZ</span> <span class="p">);</span> </pre></div> <p>And a helper script for running migrations:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>scripts/migrate.sh <span class="c1">#!/usr/bin/env bash</span> <span class="nb">set</span><span class="w"> </span>-e <span class="nb">export</span><span class="w"> </span><span class="nv">PGPASSWORD</span><span class="o">=</span>todo <span class="k">for</span><span class="w"> </span>file<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="k">$(</span>ls<span class="w"> </span>migrations<span class="k">)</span><span class="p">;</span><span class="w"> </span><span class="k">do</span> <span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="s2">&quot;Running migration: </span><span class="nv">$file</span><span class="s2">&quot;</span> <span class="w"> </span>psql<span class="w"> </span>-U<span class="w"> </span>todo<span class="w"> </span>-f<span class="w"> </span><span class="s2">&quot;migrations/</span><span class="nv">$file</span><span class="s2">&quot;</span> <span class="k">done</span> </pre></div> <p>Run it:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>chmod<span class="w"> </span>+x<span class="w"> </span>./scripts/migrate.sh $<span class="w"> </span>./scripts/migrate.sh Running<span class="w"> </span>migration:<span class="w"> </span>1_init.sql CREATE<span class="w"> </span>TABLE </pre></div> <p>And let's add some data:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>sudo<span class="w"> </span>su<span class="w"> </span>postgres postgres<span class="w"> </span>$<span class="w"> </span>psql<span class="w"> </span>-U<span class="w"> </span>todo <span class="nv">todo</span><span class="o">=</span><span class="c1"># INSERT INTO todo_item (item) VALUES (&#39;My note&#39;);</span> </pre></div> <p>Now we're ready to model the data in Java.</p> <h3 id="models">Models</h3><p>While it's possible to have <a href="https://www.jooq.org/doc/3.12/manual/code-generation/">JOOQ generate Java data classes</a> (or POJOs) by reading the database schema, the generated class cannot be directly serialized to a JSON string.</p> <p>So for each table (there's only one) we'll write a class with fields for each column. We'll use the <a href="https://javaee.github.io/tutorial/persistence-intro.html">Java Persistence API</a> (JPA) to annotate the class and fields so JOOQ will know how to deserialize query results into an instance of the model.</p> <p>We'll use <a href="https://projectlombok.org/">Lombok</a> to label the whole object as <code>Data</code> so that getter and setter methods are generated automatically for each private field. And we'll use a <a href="https://github.com/FasterXML/jackson">Jackson</a> annotation to label the JSON field name of each column.</p> <p>This is the <code>TodoItem</code> class in <code>src/main/java/api/model/TodoItem.java</code>:</p> <div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nn">api.model</span><span class="p">;</span> <span class="kn">import</span><span class="w"> </span><span class="nn">java.time.OffsetDateTime</span><span class="p">;</span> <span class="kn">import</span><span class="w"> </span><span class="nn">javax.persistence.Column</span><span class="p">;</span> <span class="kn">import</span><span class="w"> </span><span class="nn">javax.persistence.Id</span><span class="p">;</span> <span class="kn">import</span><span class="w"> </span><span class="nn">javax.persistence.Table</span><span class="p">;</span> <span class="kn">import</span><span class="w"> </span><span class="nn">com.fasterxml.jackson.annotation.JsonFormat</span><span class="p">;</span> <span class="kn">import</span><span class="w"> </span><span class="nn">com.fasterxml.jackson.annotation.JsonProperty</span><span class="p">;</span> <span class="kn">import</span><span class="w"> </span><span class="nn">lombok.Data</span><span class="p">;</span> <span class="nd">@Data</span> <span class="nd">@Table</span><span class="p">(</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;todo_item&quot;</span><span class="p">)</span> <span class="kd">public</span><span class="w"> </span><span class="kd">class</span> <span class="nc">TodoItem</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nd">@Column</span><span class="p">(</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;id&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nd">@JsonProperty</span><span class="p">(</span><span class="s">&quot;id&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nd">@Id</span> <span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="kt">long</span><span class="w"> </span><span class="n">id</span><span class="p">;</span> <span class="w"> </span><span class="nd">@Column</span><span class="p">(</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;name&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nd">@JsonProperty</span><span class="p">(</span><span class="s">&quot;name&quot;</span><span class="p">)</span> <span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="n">String</span><span class="w"> </span><span class="n">name</span><span class="p">;</span> <span class="w"> </span><span class="nd">@Column</span><span class="p">(</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;created_at&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nd">@JsonProperty</span><span class="p">(</span><span class="s">&quot;createdAt&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nd">@JsonFormat</span><span class="p">(</span><span class="n">pattern</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;yyyy-MM-dd&#39;T&#39;HH:mm:ssZ&quot;</span><span class="p">)</span> <span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="n">OffsetDateTime</span><span class="w"> </span><span class="n">createdAt</span><span class="p">;</span> <span class="w"> </span><span class="nd">@Column</span><span class="p">(</span><span class="n">name</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;completed_at&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nd">@JsonProperty</span><span class="p">(</span><span class="s">&quot;completedAt&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nd">@JsonFormat</span><span class="p">(</span><span class="n">pattern</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s">&quot;yyyy-MM-dd&#39;T&#39;HH:mm:ssZ&quot;</span><span class="p">)</span> <span class="w"> </span><span class="kd">private</span><span class="w"> </span><span class="n">OffsetDateTime</span><span class="w"> </span><span class="n">completedAt</span><span class="p">;</span> <span class="p">}</span> </pre></div> <p class="note"> The JSON format specifications for the timestamp fields aren't actually working. The formatted JSON returns a giant object and I haven't figured out how to get it to serialize to the RFC3339 string yet. </p><p>We're set! The last step is a simple controller to return a list of TODO items.</p> <h3 id="the-/items-controller">The /items controller</h3><p>In the <code>ItemsController</code> class in <code>src/main/java/api/model/ItemsController.java</code> we'll inject the <code>Dao</code> object and use it to return a page of TODO items as JSON.</p> <div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nn">api.controller</span><span class="p">;</span> <span class="kn">import</span><span class="w"> </span><span class="nn">java.util.List</span><span class="p">;</span> <span class="kn">import</span><span class="w"> </span><span class="nn">javax.inject.Inject</span><span class="p">;</span> <span class="kn">import</span><span class="w"> </span><span class="nn">javax.persistence.Table</span><span class="p">;</span> <span class="kn">import</span><span class="w"> </span><span class="nn">javax.ws.rs.GET</span><span class="p">;</span> <span class="kn">import</span><span class="w"> </span><span class="nn">javax.ws.rs.Path</span><span class="p">;</span> <span class="kn">import</span><span class="w"> </span><span class="nn">javax.ws.rs.Produces</span><span class="p">;</span> <span class="kn">import</span><span class="w"> </span><span class="nn">javax.ws.rs.core.MediaType</span><span class="p">;</span> <span class="kn">import</span><span class="w"> </span><span class="nn">org.jooq.DSLContext</span><span class="p">;</span> <span class="kn">import</span><span class="w"> </span><span class="nn">api.dao.Dao</span><span class="p">;</span> <span class="kn">import</span><span class="w"> </span><span class="nn">api.model.TodoItem</span><span class="p">;</span> <span class="nd">@Path</span><span class="p">(</span><span class="s">&quot;items&quot;</span><span class="p">)</span> <span class="kd">public</span><span class="w"> </span><span class="kd">class</span> <span class="nc">ItemsController</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nd">@Inject</span> <span class="w"> </span><span class="n">Dao</span><span class="w"> </span><span class="n">dao</span><span class="p">;</span> <span class="w"> </span><span class="nd">@GET</span> <span class="w"> </span><span class="nd">@Produces</span><span class="p">(</span><span class="n">MediaType</span><span class="p">.</span><span class="na">APPLICATION_JSON</span><span class="p">)</span> <span class="w"> </span><span class="kd">public</span><span class="w"> </span><span class="n">List</span><span class="o">&lt;</span><span class="n">TodoItem</span><span class="o">&gt;</span><span class="w"> </span><span class="nf">getServers</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">DSLContext</span><span class="w"> </span><span class="n">dslCtx</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">dao</span><span class="p">.</span><span class="na">getDSLContext</span><span class="p">();</span> <span class="w"> </span><span class="n">Table</span><span class="w"> </span><span class="n">table</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">TodoItem</span><span class="p">.</span><span class="na">class</span><span class="p">.</span><span class="na">getAnnotation</span><span class="p">(</span><span class="n">Table</span><span class="p">.</span><span class="na">class</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">dslCtx</span><span class="p">.</span><span class="na">select</span><span class="p">().</span><span class="na">from</span><span class="p">(</span><span class="n">table</span><span class="p">.</span><span class="na">name</span><span class="p">()).</span><span class="na">fetch</span><span class="p">().</span><span class="na">into</span><span class="p">(</span><span class="n">TodoItem</span><span class="p">.</span><span class="na">class</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p class="note"> There's some more implicit magic here when we return a list of <code>TodoItem</code>s. Since we marked the endpoint as producing JSON, and since Jackson is in our class path, Jersey will automatically use Jackson to serialize the list to JSON. <br /> <br /> The API is quite nice but I could do without the automatic class-loading magic. </p><p>Now we're ready to build, run and test.</p> <h3 id="building-and-running">Building and running</h3><div class="highlight"><pre><span></span>$<span class="w"> </span>mvn<span class="w"> </span>clean<span class="w"> </span>compile <span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>Scanning<span class="w"> </span><span class="k">for</span><span class="w"> </span>projects... <span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span> <span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>------------------------------&lt;<span class="w"> </span>api:api<span class="w"> </span>&gt;------------------------------- <span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>Building<span class="w"> </span>api<span class="w"> </span><span class="m">1</span>.0-SNAPSHOT <span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>--------------------------------<span class="o">[</span><span class="w"> </span>jar<span class="w"> </span><span class="o">]</span>--------------------------------- <span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span> <span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>---<span class="w"> </span>maven-clean-plugin:2.5:clean<span class="w"> </span><span class="o">(</span>default-clean<span class="o">)</span><span class="w"> </span>@<span class="w"> </span>api<span class="w"> </span>--- <span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>Deleting<span class="w"> </span>/Users/philipeaton/tmp/test/target <span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span> <span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>---<span class="w"> </span>maven-resources-plugin:2.6:resources<span class="w"> </span><span class="o">(</span>default-resources<span class="o">)</span><span class="w"> </span>@<span class="w"> </span>api<span class="w"> </span>--- <span class="o">[</span>WARNING<span class="o">]</span><span class="w"> </span>Using<span class="w"> </span>platform<span class="w"> </span>encoding<span class="w"> </span><span class="o">(</span>UTF-8<span class="w"> </span>actually<span class="o">)</span><span class="w"> </span>to<span class="w"> </span>copy<span class="w"> </span>filtered<span class="w"> </span>resources,<span class="w"> </span>i.e.<span class="w"> </span>build<span class="w"> </span>is<span class="w"> </span>platform<span class="w"> </span>dependent! <span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>skip<span class="w"> </span>non<span class="w"> </span>existing<span class="w"> </span>resourceDirectory<span class="w"> </span>/Users/philipeaton/tmp/test/src/main/resources <span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span> <span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>---<span class="w"> </span>maven-compiler-plugin:3.8.1:compile<span class="w"> </span><span class="o">(</span>default-compile<span class="o">)</span><span class="w"> </span>@<span class="w"> </span>api<span class="w"> </span>--- <span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>Changes<span class="w"> </span>detected<span class="w"> </span>-<span class="w"> </span>recompiling<span class="w"> </span>the<span class="w"> </span>module! <span class="o">[</span>WARNING<span class="o">]</span><span class="w"> </span>File<span class="w"> </span>encoding<span class="w"> </span>has<span class="w"> </span>not<span class="w"> </span>been<span class="w"> </span>set,<span class="w"> </span>using<span class="w"> </span>platform<span class="w"> </span>encoding<span class="w"> </span>UTF-8,<span class="w"> </span>i.e.<span class="w"> </span>build<span class="w"> </span>is<span class="w"> </span>platform<span class="w"> </span>dependent! <span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>Compiling<span class="w"> </span><span class="m">6</span><span class="w"> </span><span class="nb">source</span><span class="w"> </span>files<span class="w"> </span>to<span class="w"> </span>/Users/philipeaton/tmp/test/target/classes <span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>------------------------------------------------------------------------ <span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>BUILD<span class="w"> </span>SUCCESS <span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>------------------------------------------------------------------------ <span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>Total<span class="w"> </span>time:<span class="w"> </span><span class="m">2</span>.198<span class="w"> </span>s <span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>Finished<span class="w"> </span>at:<span class="w"> </span><span class="m">2020</span>-02-01T17:07:14-05:00 <span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>------------------------------------------------------------------------ $<span class="w"> </span>mvn<span class="w"> </span>exec:java <span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>Scanning<span class="w"> </span><span class="k">for</span><span class="w"> </span>projects... <span class="o">[</span>INFO<span class="o">]</span> <span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>------------------------------&lt;<span class="w"> </span>api:api<span class="w"> </span>&gt;------------------------------- <span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>Building<span class="w"> </span>api<span class="w"> </span><span class="m">1</span>.0-SNAPSHOT <span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>--------------------------------<span class="o">[</span><span class="w"> </span>jar<span class="w"> </span><span class="o">]</span>--------------------------------- <span class="o">[</span>INFO<span class="o">]</span> <span class="o">[</span>INFO<span class="o">]</span><span class="w"> </span>---<span class="w"> </span>exec-maven-plugin:1.6.0:java<span class="w"> </span><span class="o">(</span>default-cli<span class="o">)</span><span class="w"> </span>@<span class="w"> </span>api<span class="w"> </span>--- <span class="m">17</span>:06:53.793<span class="w"> </span><span class="o">[</span>api.Main.main<span class="o">()]</span><span class="w"> </span>INFO<span class="w"> </span>org.eclipse.jetty.util.log<span class="w"> </span>-<span class="w"> </span>Logging<span class="w"> </span>initialized<span class="w"> </span>@2017ms<span class="w"> </span>to<span class="w"> </span>org.eclipse.jetty.util.log.Slf4jLog <span class="m">17</span>:06:54.378<span class="w"> </span><span class="o">[</span>api.Main.main<span class="o">()]</span><span class="w"> </span>INFO<span class="w"> </span>org.eclipse.jetty.server.Server<span class="w"> </span>-<span class="w"> </span>jetty-9.4.17.v20190418<span class="p">;</span><span class="w"> </span>built:<span class="w"> </span><span class="m">2019</span>-04-18T19:45:35.259Z<span class="p">;</span><span class="w"> </span>git:<span class="w"> </span>aa1c656c315c011c01e7b21aabb04066635b9f67<span class="p">;</span><span class="w"> </span>jvm<span class="w"> </span><span class="m">13</span>+33 <span class="m">17</span>:06:54.425<span class="w"> </span><span class="o">[</span>api.Main.main<span class="o">()]</span><span class="w"> </span>INFO<span class="w"> </span>org.eclipse.jetty.server.AbstractConnector<span class="w"> </span>-<span class="w"> </span>Started<span class="w"> </span>ServerConnector@3943a159<span class="o">{</span>HTTP/1.1,<span class="o">[</span>http/1.1<span class="o">]}{</span><span class="m">0</span>.0.0.0:7780<span class="o">}</span> <span class="m">17</span>:06:54.425<span class="w"> </span><span class="o">[</span>api.Main.main<span class="o">()]</span><span class="w"> </span>INFO<span class="w"> </span>org.eclipse.jetty.server.Server<span class="w"> </span>-<span class="w"> </span>Started<span class="w"> </span>@2651ms <span class="m">17</span>:06:54.425<span class="w"> </span><span class="o">[</span>api.Main.main<span class="o">()]</span><span class="w"> </span>INFO<span class="w"> </span>api.app.Application<span class="w"> </span>-<span class="w"> </span>Started<span class="w"> </span>listening<span class="w"> </span>on<span class="w"> </span>http://localhost:7780 </pre></div> <p>In a new terminal curl the endpoint:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>curl<span class="w"> </span>localhost:7780/items<span class="w"> </span><span class="p">|</span><span class="w"> </span>jq <span class="o">[</span> <span class="w"> </span><span class="o">{</span> <span class="w"> </span><span class="s2">&quot;id&quot;</span>:<span class="w"> </span><span class="m">1</span>, <span class="w"> </span><span class="s2">&quot;name&quot;</span>:<span class="w"> </span>null, <span class="w"> </span><span class="s2">&quot;createdAt&quot;</span>:<span class="w"> </span><span class="o">{</span> <span class="w"> </span><span class="s2">&quot;offset&quot;</span>:<span class="w"> </span><span class="o">{</span> <span class="w"> </span><span class="s2">&quot;totalSeconds&quot;</span>:<span class="w"> </span>-18000, <span class="w"> </span><span class="s2">&quot;id&quot;</span>:<span class="w"> </span><span class="s2">&quot;-05:00&quot;</span>, <span class="w"> </span><span class="s2">&quot;rules&quot;</span>:<span class="w"> </span><span class="o">{</span> <span class="w"> </span><span class="s2">&quot;transitions&quot;</span>:<span class="w"> </span><span class="o">[]</span>, <span class="w"> </span><span class="s2">&quot;transitionRules&quot;</span>:<span class="w"> </span><span class="o">[]</span>, <span class="w"> </span><span class="s2">&quot;fixedOffset&quot;</span>:<span class="w"> </span><span class="nb">true</span> <span class="w"> </span><span class="o">}</span> <span class="w"> </span><span class="o">}</span>, <span class="w"> </span><span class="s2">&quot;dayOfWeek&quot;</span>:<span class="w"> </span><span class="s2">&quot;SATURDAY&quot;</span>, <span class="w"> </span><span class="s2">&quot;dayOfYear&quot;</span>:<span class="w"> </span><span class="m">32</span>, <span class="w"> </span><span class="s2">&quot;nano&quot;</span>:<span class="w"> </span><span class="m">594440000</span>, <span class="w"> </span><span class="s2">&quot;year&quot;</span>:<span class="w"> </span><span class="m">2020</span>, <span class="w"> </span><span class="s2">&quot;monthValue&quot;</span>:<span class="w"> </span><span class="m">2</span>, <span class="w"> </span><span class="s2">&quot;dayOfMonth&quot;</span>:<span class="w"> </span><span class="m">1</span>, <span class="w"> </span><span class="s2">&quot;hour&quot;</span>:<span class="w"> </span><span class="m">17</span>, <span class="w"> </span><span class="s2">&quot;minute&quot;</span>:<span class="w"> </span><span class="m">8</span>, <span class="w"> </span><span class="s2">&quot;second&quot;</span>:<span class="w"> </span><span class="m">0</span>, <span class="w"> </span><span class="s2">&quot;month&quot;</span>:<span class="w"> </span><span class="s2">&quot;FEBRUARY&quot;</span> <span class="w"> </span><span class="o">}</span>, <span class="w"> </span><span class="s2">&quot;completedAt&quot;</span>:<span class="w"> </span>null <span class="w"> </span><span class="o">}</span> <span class="o">]</span> </pre></div> <p>And we're done!</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I really enjoy using Java for REST APIs, avoiding Spring and Play. Use simple but mature libraries that are no more difficult to cobble together than everything you must do in Go or Flask for a REST API. vs Go you get generics and vs python you get safety<a href="https://t.co/twmjZprow6">https://t.co/twmjZprow6</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1223733417453465601?ref_src=twsrc%5Etfw">February 1, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/a-minimal-rest-api-in-java.htmlSat, 01 Feb 2020 00:00:00 +0000Writing a lisp compiler from scratch in JavaScript: 6. an x86 upgradehttp://notes.eatonphil.com/compiler-basics-an-x86-upgrade.html<p class="note"> Previously in compiler basics: <! forgive me, for I have sinned > <br /> <a href="/compiler-basics-lisp-to-assembly.html">1. lisp to assembly</a> <br /> <a href="/compiler-basics-functions.html">2. user-defined functions and variables</a> <br /> <a href="/compiler-basics-llvm.html">3. LLVM</a> <br /> <a href="/compiler-basics-llvm-conditionals.html">4. LLVM conditionals and compiling fibonacci</a> <br /> <a href="/compiler-basics-llvm-system-calls.html">5. LLVM system calls</a> </p><p>This post upgrades the ulisp x86 backend from using a limited set of registers (with no spilling support) to solely using the stack to pass values between expressions.</p> <p>This is a slightly longer post since we've got a lot of catchup to do to get to feature parity with the LLVM backend. Namely:</p> <ul> <li>"Infinite" locals, parameters</li> <li>Function definitions</li> <li>Variable references</li> <li>Arithmetic and logical operations</li> <li>If</li> <li>Syscalls</li> </ul> <p>We'll tackle the first four points first and finish up with the last two. This way we can support the same fibonacci program that prints integers to stdout that we support in the LLVM backend.</p> <p>As always the <a href="https://github.com/eatonphil/ulisp">code is available on Github</a>.</p> <p>But first a digression into how this is suddenly easy for us to do with x86 and one-pass (sorta) code generation.</p> <h3 id="stack-based-languages">Stack-based languages</h3><p>Stack-based languages have the extremely convenient attribute that values are (by default) stored on the stack, which allows a code generator targeting a stack-based language the option to omit handling register allocation. And as it happens, x86 has enough support to make it easy to treat as a stack machine.</p> <p>As we build out the code generator for x86 as a stack machine we need to keep two commitments in mind:</p> <ul> <li>Every expression must pop all arguments/operands</li> <li>Every expression must store one result back on the stack</li> </ul> <p>In the future, we may replace the second commitment. But for now it is more than enough.</p> <h3 id="boilerplate">Boilerplate</h3><p>We'll start with the existing x86 backend code and strip all the implementation code:</p> <div class="highlight"><pre><span></span><span class="kd">const</span><span class="w"> </span><span class="nx">cp</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">&#39;child_process&#39;</span><span class="p">);</span> <span class="kd">const</span><span class="w"> </span><span class="nx">fs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">&#39;fs&#39;</span><span class="p">);</span> <span class="kd">const</span><span class="w"> </span><span class="nx">os</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">&#39;os&#39;</span><span class="p">);</span> <span class="kd">let</span><span class="w"> </span><span class="nx">GLOBAL_COUNTER</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">0</span><span class="p">;</span> <span class="kd">const</span><span class="w"> </span><span class="nx">SYSCALL_MAP</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">darwin</span><span class="o">:</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">exit</span><span class="o">:</span><span class="w"> </span><span class="s1">&#39;0x2000001&#39;</span><span class="p">,</span> <span class="w"> </span><span class="nx">write</span><span class="o">:</span><span class="w"> </span><span class="s1">&#39;0x2000004&#39;</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="nx">linux</span><span class="o">:</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">exit</span><span class="o">:</span><span class="w"> </span><span class="mf">60</span><span class="p">,</span> <span class="w"> </span><span class="nx">write</span><span class="o">:</span><span class="w"> </span><span class="mf">1</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span> <span class="p">}[</span><span class="nx">os</span><span class="p">.</span><span class="nx">platform</span><span class="p">()];</span> <span class="kd">class</span><span class="w"> </span><span class="nx">Scope</span><span class="w"> </span><span class="p">{}</span> <span class="kd">class</span><span class="w"> </span><span class="nx">Compiler</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">constructor</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">outBuffer</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[];</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">primitiveFunctions</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">def</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileDefine</span><span class="p">.</span><span class="nx">bind</span><span class="p">(</span><span class="k">this</span><span class="p">),</span> <span class="w"> </span><span class="nx">begin</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileBegin</span><span class="p">.</span><span class="nx">bind</span><span class="p">(</span><span class="k">this</span><span class="p">),</span> <span class="w"> </span><span class="k">if</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileIf</span><span class="p">.</span><span class="nx">bind</span><span class="p">(</span><span class="k">this</span><span class="p">),</span> <span class="w"> </span><span class="p">...</span><span class="k">this</span><span class="p">.</span><span class="nx">prepareArithmeticWrappers</span><span class="p">(),</span> <span class="w"> </span><span class="p">...</span><span class="k">this</span><span class="p">.</span><span class="nx">prepareLogicalWrappers</span><span class="p">(),</span> <span class="w"> </span><span class="p">...</span><span class="k">this</span><span class="p">.</span><span class="nx">prepareSyscallWrappers</span><span class="p">(),</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">prepareArithmeticWrappers</span><span class="p">()</span><span class="w"> </span><span class="p">{}</span> <span class="w"> </span><span class="nx">prepareLogicalWrappers</span><span class="p">()</span><span class="w"> </span><span class="p">{}</span> <span class="w"> </span><span class="nx">prepareSyscallWrappers</span><span class="p">()</span><span class="w"> </span><span class="p">{}</span> <span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">depth</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="kc">undefined</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">args</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="kc">undefined</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">&#39;Invalid call to emit&#39;</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">indent</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="nb">Array</span><span class="p">(</span><span class="nx">depth</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mf">1</span><span class="p">).</span><span class="nx">join</span><span class="p">(</span><span class="s1">&#39; &#39;</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">outBuffer</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">indent</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">args</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">)</span><span class="w"> </span><span class="p">{}</span> <span class="w"> </span><span class="nx">compileIf</span><span class="p">([</span><span class="nx">test</span><span class="p">,</span><span class="w"> </span><span class="nx">then</span><span class="p">,</span><span class="w"> </span><span class="nx">els</span><span class="p">],</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">)</span><span class="w"> </span><span class="p">{}</span> <span class="w"> </span><span class="nx">compileBegin</span><span class="p">(</span><span class="nx">body</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="nx">topLevel</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">false</span><span class="p">)</span><span class="w"> </span><span class="p">{}</span> <span class="w"> </span><span class="nx">compileDefine</span><span class="p">([</span><span class="nx">name</span><span class="p">,</span><span class="w"> </span><span class="nx">params</span><span class="p">,</span><span class="w"> </span><span class="p">...</span><span class="nx">body</span><span class="p">],</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">)</span><span class="w"> </span><span class="p">{}</span> <span class="w"> </span><span class="nx">compileCall</span><span class="p">(</span><span class="nx">fun</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">)</span><span class="w"> </span><span class="p">{}</span> <span class="w"> </span><span class="nx">emitPrefix</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;.global _main\n&#39;</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;.text\n&#39;</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">emitPostfix</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;_main:&#39;</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;CALL main&#39;</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;MOV RDI, RAX&#39;</span><span class="p">);</span><span class="w"> </span><span class="c1">// Set exit arg</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`MOV RAX, </span><span class="si">${</span><span class="nx">SYSCALL_MAP</span><span class="p">[</span><span class="s1">&#39;exit&#39;</span><span class="p">]</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;SYSCALL&#39;</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">getOutput</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">output</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">outBuffer</span><span class="p">.</span><span class="nx">join</span><span class="p">(</span><span class="s1">&#39;\n&#39;</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Leave at most one empty line</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">output</span><span class="p">.</span><span class="nx">replace</span><span class="p">(</span><span class="sr">/\n\n\n+/g</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;\n\n&#39;</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> <span class="nx">module</span><span class="p">.</span><span class="nx">exports</span><span class="p">.</span><span class="nx">compile</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kd">function</span><span class="p">(</span><span class="nx">ast</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="nx">Compiler</span><span class="p">();</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">emitPrefix</span><span class="p">();</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="nx">Scope</span><span class="p">();</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">compileBegin</span><span class="p">(</span><span class="nx">ast</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">,</span><span class="w"> </span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="kc">true</span><span class="p">);</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">emitPostfix</span><span class="p">();</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">getOutput</span><span class="p">();</span> <span class="p">};</span> <span class="nx">module</span><span class="p">.</span><span class="nx">exports</span><span class="p">.</span><span class="nx">build</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kd">function</span><span class="p">(</span><span class="nx">buildDir</span><span class="p">,</span><span class="w"> </span><span class="nx">program</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">prog</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;prog&#39;</span><span class="p">;</span> <span class="w"> </span><span class="nx">fs</span><span class="p">.</span><span class="nx">writeFileSync</span><span class="p">(</span><span class="sb">`</span><span class="si">${</span><span class="nx">buildDir</span><span class="si">}</span><span class="sb">/</span><span class="si">${</span><span class="nx">prog</span><span class="si">}</span><span class="sb">.s`</span><span class="p">,</span><span class="w"> </span><span class="nx">program</span><span class="p">);</span> <span class="w"> </span><span class="nx">cp</span><span class="p">.</span><span class="nx">execSync</span><span class="p">(</span> <span class="w"> </span><span class="sb">`gcc -mstackrealign -masm=intel -o </span><span class="si">${</span><span class="nx">buildDir</span><span class="si">}</span><span class="sb">/</span><span class="si">${</span><span class="nx">prog</span><span class="si">}</span><span class="sb"> </span><span class="si">${</span><span class="nx">buildDir</span><span class="si">}</span><span class="sb">/</span><span class="si">${</span><span class="nx">prog</span><span class="si">}</span><span class="sb">.s`</span><span class="p">,</span> <span class="w"> </span><span class="p">);</span> <span class="p">};</span> </pre></div> <p>The prefix and postfix stays mostly the same as the original implementation. But we'll assume a couple of new helpers to get us in parity with the LLVM backend:</p> <ul> <li><code>compileDefine</code></li> <li><code>compileBegin</code></li> <li><code>compileIf</code></li> <li><code>compileCall</code></li> <li><code>prepareArithmeticWrappers</code></li> <li><code>prepareLogicalWrappers</code></li> <li><code>prepareSyscallWrappers</code></li> </ul> <p>The <code>prepareArithmeticWrappers</code> helper will define wrappers for arithmetic instructions. The <code>prepareLogicalWrappers</code> helper will define wrappers for logical instructions. And the <code>prepareSyscallWrappers</code> helper will define a wrapper for syscalls and generate builtins based on the SYSCALL_MAP entries.</p> <h3 id="scope">Scope</h3><p>Similar to our LLVM backend's Context and Scope helpers we'll define our own for the x86 backend. Since we're placing all locals on the stack, the two most important things Scope will do for us are:</p> <ul> <li>Map identifiers to escaped strings</li> <li>Store and increment the location of the local on the stack</li> </ul> <p>Here's what it will look like:</p> <div class="highlight"><pre><span></span><span class="kd">class</span><span class="w"> </span><span class="nx">Scope</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">constructor</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">localOffset</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">1</span><span class="p">;</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">map</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">assign</span><span class="p">(</span><span class="nx">name</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">safe</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">name</span><span class="p">.</span><span class="nx">replace</span><span class="p">(</span><span class="s1">&#39;-&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;_&#39;</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">map</span><span class="p">[</span><span class="nx">safe</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">localOffset</span><span class="o">++</span><span class="p">;</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">safe</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">symbol</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">localOffset</span><span class="o">++</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">lookup</span><span class="p">(</span><span class="nx">name</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">safe</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">name</span><span class="p">.</span><span class="nx">replace</span><span class="p">(</span><span class="s1">&#39;-&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;_&#39;</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">map</span><span class="p">[</span><span class="nx">safe</span><span class="p">])</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">name</span><span class="o">:</span><span class="w"> </span><span class="nx">safe</span><span class="p">,</span><span class="w"> </span><span class="nx">offset</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">map</span><span class="p">[</span><span class="nx">safe</span><span class="p">]</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">null</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">copy</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="nx">Scope</span><span class="p">();</span> <span class="w"> </span><span class="c1">// In the future we may need to store s.scopeOffset = this.scopeOffset + 1</span> <span class="w"> </span><span class="c1">// so we can read outer-scoped values at runtime.</span> <span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">map</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="p">...</span><span class="k">this</span><span class="p">.</span><span class="nx">map</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">s</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <h3 id="compileexpression">compileExpression</h3><p>An expression will be one of:</p> <ul> <li>A function call (possibly a builtin like <code>def</code> or <code>+</code>)</li> <li>A literal value (e.g. <code>29</code>)</li> <li>A reference (e.g. <code>&c</code>)</li> <li>An identifier (e.g. <code>my-var</code>)</li> </ul> <p>We'll handle compiling an expression in that order. If the AST argument passed to <code>compileExpression</code> is an array, we will call <code>compileCall</code> and return.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Is a nested function call, compile it</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nb">Array</span><span class="p">.</span><span class="nx">isArray</span><span class="p">(</span><span class="nx">arg</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileCall</span><span class="p">(</span><span class="nx">arg</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span><span class="w"> </span><span class="nx">arg</span><span class="p">.</span><span class="nx">slice</span><span class="p">(</span><span class="mf">1</span><span class="p">),</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>If the AST is a number, we will push the number onto the stack and return.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Is a nested function call, compile it</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nb">Array</span><span class="p">.</span><span class="nx">isArray</span><span class="p">(</span><span class="nx">arg</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileCall</span><span class="p">(</span><span class="nx">arg</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span><span class="w"> </span><span class="nx">arg</span><span class="p">.</span><span class="nx">slice</span><span class="p">(</span><span class="mf">1</span><span class="p">),</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nb">Number</span><span class="p">.</span><span class="nx">isInteger</span><span class="p">(</span><span class="nx">arg</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`PUSH </span><span class="si">${</span><span class="nx">arg</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>If the AST is a string that starts with <code>&</code> we will look up the location of the identifier after the <code>&</code>, push its <em>location</em> onto the stack and return.</p> <p>We count on the Scope storing its offset from the "frame pointer", which we will later set up to be stored in <code>RBP</code>.</p> <p>Locals will be stored after the frame pointer and parameters will be stored before it. So we'll need to add or subtract from the frame pointer depending on if we need a positive or negative offset from it.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Is a nested function call, compile it</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nb">Array</span><span class="p">.</span><span class="nx">isArray</span><span class="p">(</span><span class="nx">arg</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileCall</span><span class="p">(</span><span class="nx">arg</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span><span class="w"> </span><span class="nx">arg</span><span class="p">.</span><span class="nx">slice</span><span class="p">(</span><span class="mf">1</span><span class="p">),</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nb">Number</span><span class="p">.</span><span class="nx">isInteger</span><span class="p">(</span><span class="nx">arg</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`PUSH </span><span class="si">${</span><span class="nx">arg</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">arg</span><span class="p">.</span><span class="nx">startsWith</span><span class="p">(</span><span class="s1">&#39;&amp;&#39;</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">offset</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">lookup</span><span class="p">(</span><span class="nx">arg</span><span class="p">.</span><span class="nx">substring</span><span class="p">(</span><span class="mf">1</span><span class="p">));</span> <span class="w"> </span><span class="c1">// Copy the frame pointer so we can return an offset from it</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`MOV RAX, RBP`</span><span class="p">);</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">operation</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">offset</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="mf">0</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="s1">&#39;ADD&#39;</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s1">&#39;SUB&#39;</span><span class="p">;</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`</span><span class="si">${</span><span class="nx">operation</span><span class="si">}</span><span class="sb"> RAX, </span><span class="si">${</span><span class="nb">Math</span><span class="p">.</span><span class="nx">abs</span><span class="p">(</span><span class="nx">offset</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">8</span><span class="p">)</span><span class="si">}</span><span class="sb"> # </span><span class="si">${</span><span class="nx">arg</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`PUSH RAX`</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Finally, we'll look up the identifier and copy the value (in its offset on the stack) to the top of the stack.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Is a nested function call, compile it</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nb">Array</span><span class="p">.</span><span class="nx">isArray</span><span class="p">(</span><span class="nx">arg</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileCall</span><span class="p">(</span><span class="nx">arg</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span><span class="w"> </span><span class="nx">arg</span><span class="p">.</span><span class="nx">slice</span><span class="p">(</span><span class="mf">1</span><span class="p">),</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nb">Number</span><span class="p">.</span><span class="nx">isInteger</span><span class="p">(</span><span class="nx">arg</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`PUSH </span><span class="si">${</span><span class="nx">arg</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">arg</span><span class="p">.</span><span class="nx">startsWith</span><span class="p">(</span><span class="s1">&#39;&amp;&#39;</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">offset</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">lookup</span><span class="p">(</span><span class="nx">arg</span><span class="p">.</span><span class="nx">substring</span><span class="p">(</span><span class="mf">1</span><span class="p">));</span> <span class="w"> </span><span class="c1">// Copy the frame pointer so we can return an offset from it</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`MOV RAX, RBP`</span><span class="p">);</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">operation</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">offset</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="mf">0</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="s1">&#39;ADD&#39;</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s1">&#39;SUB&#39;</span><span class="p">;</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`</span><span class="si">${</span><span class="nx">operation</span><span class="si">}</span><span class="sb"> RAX, </span><span class="si">${</span><span class="nb">Math</span><span class="p">.</span><span class="nx">abs</span><span class="p">(</span><span class="nx">offset</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">8</span><span class="p">)</span><span class="si">}</span><span class="sb"> # </span><span class="si">${</span><span class="nx">arg</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`PUSH RAX`</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Variable lookup</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">offset</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">lookup</span><span class="p">(</span><span class="nx">arg</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">offset</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">operation</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">offset</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="mf">0</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="s1">&#39;+&#39;</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s1">&#39;-&#39;</span><span class="p">;</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span> <span class="w"> </span><span class="nx">depth</span><span class="p">,</span> <span class="w"> </span><span class="sb">`PUSH [RBP </span><span class="si">${</span><span class="nx">operation</span><span class="si">}</span><span class="sb"> </span><span class="si">${</span><span class="nb">Math</span><span class="p">.</span><span class="nx">abs</span><span class="p">(</span><span class="nx">offset</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">8</span><span class="p">)</span><span class="si">}</span><span class="sb">] # </span><span class="si">${</span><span class="nx">arg</span><span class="si">}</span><span class="sb">`</span><span class="p">,</span> <span class="w"> </span><span class="p">);</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span> <span class="w"> </span><span class="s1">&#39;Attempt to reference undefined variable or unsupported literal: &#39;</span><span class="w"> </span><span class="o">+</span> <span class="w"> </span><span class="nx">arg</span><span class="p">,</span> <span class="w"> </span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>And that's it for handling expression! Let's add <code>compileCall</code> support now that we referenced it.</p> <h3 id="compilecall">compileCall</h3><p>A call will first check if the call is a builtin. If so, it will immediately pass control to the builtin.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">compileCall</span><span class="p">(</span><span class="nx">fun</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">primitiveFunctions</span><span class="p">[</span><span class="nx">fun</span><span class="p">])</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">primitiveFunctions</span><span class="p">[</span><span class="nx">fun</span><span class="p">](</span><span class="nx">args</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Otherwise it will compile every argument to the call (which will leave all the resulting values on the stack.)</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">compileCall</span><span class="p">(</span><span class="nx">fun</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">primitiveFunctions</span><span class="p">[</span><span class="nx">fun</span><span class="p">])</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">primitiveFunctions</span><span class="p">[</span><span class="nx">fun</span><span class="p">](</span><span class="nx">args</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Compile registers and store on the stack</span> <span class="w"> </span><span class="nx">args</span><span class="p">.</span><span class="nx">map</span><span class="p">((</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">));</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Then we will check that function is defined and call it.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">compileCall</span><span class="p">(</span><span class="nx">fun</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">primitiveFunctions</span><span class="p">[</span><span class="nx">fun</span><span class="p">])</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">primitiveFunctions</span><span class="p">[</span><span class="nx">fun</span><span class="p">](</span><span class="nx">args</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Compile registers and store on the stack</span> <span class="w"> </span><span class="nx">args</span><span class="p">.</span><span class="nx">map</span><span class="p">((</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">));</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">fn</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">lookup</span><span class="p">(</span><span class="nx">fun</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">fn</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`CALL </span><span class="si">${</span><span class="nx">fn</span><span class="p">.</span><span class="nx">name</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">&#39;Attempt to call undefined function: &#39;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">fun</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Then we'll reset the stack pointer (to maintain our commitment) based on the number of arguments and push <code>RAX</code> (where the return result of the function call will be stored) onto the stack. We'll make two minor optimizations for when there is exactly zero or one argument to the function.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">compileCall</span><span class="p">(</span><span class="nx">fun</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">primitiveFunctions</span><span class="p">[</span><span class="nx">fun</span><span class="p">])</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">primitiveFunctions</span><span class="p">[</span><span class="nx">fun</span><span class="p">](</span><span class="nx">args</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Compile registers and store on the stack</span> <span class="w"> </span><span class="nx">args</span><span class="p">.</span><span class="nx">map</span><span class="p">((</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">));</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">fn</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">lookup</span><span class="p">(</span><span class="nx">fun</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">fn</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`CALL </span><span class="si">${</span><span class="nx">fn</span><span class="p">.</span><span class="nx">name</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">&#39;Attempt to call undefined function: &#39;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">fun</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">args</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="mf">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Drop the args</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`ADD RSP, </span><span class="si">${</span><span class="nx">args</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mf">8</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">args</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="mf">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`MOV [RSP], RAX\n`</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;PUSH RAX\n&#39;</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>When there is only one argument, we can just set the top value on the stack to be the return result of the call rather than resetting the stack pointer just to push onto it.</p> <p>And that's it for <code>compileCall</code>! Now that we've got a feel for expressions and function calls, let's add some simple arithmetic operations.</p> <h3 id="preparearithmeticwrappers">prepareArithmeticWrappers</h3><p>There are two major kind of arithmetic instructions we'll wrap for now:</p> <ul> <li>"General" instructions that operate on two arguments, putting the return result in the first argument</li> <li>"RAX" instructions that operate on RAX and the first argument, putting the return result in <code>RAX</code> and possibly <code>RDX</code></li> </ul> <h4 id="preparegeneral">prepareGeneral</h4><p>This helper will compile its two arguments and pop the second argument into <code>RAX</code>. This is because x86 instructions typically require one argument to be a register if one argument is allowed to be a memory address.</p> <p>We'll use the stack address as the first argument so 1) that non-commutative operations are correct and 2) the result is stored right back onto the stack in the right location.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">prepareGeneral</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="nx">instruction</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="p">(</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">depth</span><span class="o">++</span><span class="p">;</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`# </span><span class="si">${</span><span class="nx">instruction</span><span class="p">.</span><span class="nx">toUpperCase</span><span class="p">()</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Compile first argument</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">arg</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Compile second argument</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">arg</span><span class="p">[</span><span class="mf">1</span><span class="p">],</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`POP RAX`</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Compile operation</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`</span><span class="si">${</span><span class="nx">instruction</span><span class="p">.</span><span class="nx">toUpperCase</span><span class="p">()</span><span class="si">}</span><span class="sb"> [RSP], RAX`</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`# End </span><span class="si">${</span><span class="nx">instruction</span><span class="p">.</span><span class="nx">toUpperCase</span><span class="p">()</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span> <span class="w"> </span><span class="p">};</span> </pre></div> <h4 id="preparerax">prepareRax</h4><p>This helper will similarly compile its two arguments and pop the second argument into <code>RAX</code>. But the RAX-implicit instructions require the argument to be stored in a register so we'll use the <code>XCHG</code> instruction to swap <code>RAX</code> with the value on the top of the stack (the first argument).</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">prepareRax</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="nx">instruction</span><span class="p">,</span><span class="w"> </span><span class="nx">outRegister</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;RAX&#39;</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="nx">arg</span><span class="p">,</span> <span class="w"> </span><span class="nx">scope</span><span class="p">,</span> <span class="w"> </span><span class="nx">depth</span><span class="p">,</span> <span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">depth</span><span class="o">++</span><span class="p">;</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`# </span><span class="si">${</span><span class="nx">instruction</span><span class="p">.</span><span class="nx">toUpperCase</span><span class="p">()</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Compile first argument, store in RAX</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">arg</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Compile second argument</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">arg</span><span class="p">[</span><span class="mf">1</span><span class="p">],</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span> <span class="w"> </span><span class="c1">// POP second argument and swap with first</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`POP RAX`</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`XCHG [RSP], RAX`</span><span class="p">);</span> </pre></div> <p>This may seem roundabout but remember that we <em>must</em> pop all arguments to the instruction to maintain our commitment.</p> <p>Next we'll zero out the <code>RDX</code> register if the operation is <code>DIV</code>, perform the operation, and store the result on the top of the stack.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">prepareRax</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="nx">instruction</span><span class="p">,</span><span class="w"> </span><span class="nx">outRegister</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;RAX&#39;</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="nx">arg</span><span class="p">,</span> <span class="w"> </span><span class="nx">scope</span><span class="p">,</span> <span class="w"> </span><span class="nx">depth</span><span class="p">,</span> <span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">depth</span><span class="o">++</span><span class="p">;</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`# </span><span class="si">${</span><span class="nx">instruction</span><span class="p">.</span><span class="nx">toUpperCase</span><span class="p">()</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Compile first argument, store in RAX</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">arg</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Compile second argument</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">arg</span><span class="p">[</span><span class="mf">1</span><span class="p">],</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span> <span class="w"> </span><span class="c1">// POP second argument and swap with first</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`POP RAX`</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`XCHG [RSP], RAX`</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Reset RDX for DIV</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">instruction</span><span class="p">.</span><span class="nx">toUpperCase</span><span class="p">()</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="s1">&#39;DIV&#39;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`XOR RDX, RDX`</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Compiler operation</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`</span><span class="si">${</span><span class="nx">instruction</span><span class="p">.</span><span class="nx">toUpperCase</span><span class="p">()</span><span class="si">}</span><span class="sb"> QWORD PTR [RSP]`</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Swap the top of the stack</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`MOV [RSP], </span><span class="si">${</span><span class="nx">outRegister</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span> <span class="w"> </span><span class="p">};</span> </pre></div> <p>We parameterize the out register because the <code>%</code> wrapper will call <code>DIV</code> but need <code>RDX</code> rather than <code>RAX</code> after the operation.</p> <h4 id="preparearithmeticwrappers">prepareArithmeticWrappers</h4><p>Putting everything together we get:</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">prepareArithmeticWrappers</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// General operatations</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">prepareGeneral</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="nx">instruction</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="p">(</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">depth</span><span class="o">++</span><span class="p">;</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`# </span><span class="si">${</span><span class="nx">instruction</span><span class="p">.</span><span class="nx">toUpperCase</span><span class="p">()</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Compile first argument</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">arg</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Compile second argument</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">arg</span><span class="p">[</span><span class="mf">1</span><span class="p">],</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`POP RAX`</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Compile operation</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`</span><span class="si">${</span><span class="nx">instruction</span><span class="p">.</span><span class="nx">toUpperCase</span><span class="p">()</span><span class="si">}</span><span class="sb"> [RSP], RAX`</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`# End </span><span class="si">${</span><span class="nx">instruction</span><span class="p">.</span><span class="nx">toUpperCase</span><span class="p">()</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="c1">// Operations that use RAX implicitly</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">prepareRax</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="nx">instruction</span><span class="p">,</span><span class="w"> </span><span class="nx">outRegister</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;RAX&#39;</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="nx">arg</span><span class="p">,</span> <span class="w"> </span><span class="nx">scope</span><span class="p">,</span> <span class="w"> </span><span class="nx">depth</span><span class="p">,</span> <span class="w"> </span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">depth</span><span class="o">++</span><span class="p">;</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`# </span><span class="si">${</span><span class="nx">instruction</span><span class="p">.</span><span class="nx">toUpperCase</span><span class="p">()</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Compile first argument, store in RAX</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">arg</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Compile second argument</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">arg</span><span class="p">[</span><span class="mf">1</span><span class="p">],</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span> <span class="w"> </span><span class="c1">// POP second argument and swap with first</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`POP RAX`</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`XCHG [RSP], RAX`</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Reset RDX for DIV</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">instruction</span><span class="p">.</span><span class="nx">toUpperCase</span><span class="p">()</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="s1">&#39;DIV&#39;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`XOR RDX, RDX`</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Compiler operation</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`</span><span class="si">${</span><span class="nx">instruction</span><span class="p">.</span><span class="nx">toUpperCase</span><span class="p">()</span><span class="si">}</span><span class="sb"> QWORD PTR [RSP]`</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Swap the top of the stack</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`MOV [RSP], </span><span class="si">${</span><span class="nx">outRegister</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="s1">&#39;+&#39;</span><span class="o">:</span><span class="w"> </span><span class="nx">prepareGeneral</span><span class="p">(</span><span class="s1">&#39;add&#39;</span><span class="p">),</span> <span class="w"> </span><span class="s1">&#39;-&#39;</span><span class="o">:</span><span class="w"> </span><span class="nx">prepareGeneral</span><span class="p">(</span><span class="s1">&#39;sub&#39;</span><span class="p">),</span> <span class="w"> </span><span class="s1">&#39;&amp;&#39;</span><span class="o">:</span><span class="w"> </span><span class="nx">prepareGeneral</span><span class="p">(</span><span class="s1">&#39;and&#39;</span><span class="p">),</span> <span class="w"> </span><span class="s1">&#39;|&#39;</span><span class="o">:</span><span class="w"> </span><span class="nx">prepareGeneral</span><span class="p">(</span><span class="s1">&#39;or&#39;</span><span class="p">),</span> <span class="w"> </span><span class="s1">&#39;=&#39;</span><span class="o">:</span><span class="w"> </span><span class="nx">prepareGeneral</span><span class="p">(</span><span class="s1">&#39;mov&#39;</span><span class="p">),</span> <span class="w"> </span><span class="s1">&#39;*&#39;</span><span class="o">:</span><span class="w"> </span><span class="nx">prepareRax</span><span class="p">(</span><span class="s1">&#39;mul&#39;</span><span class="p">),</span> <span class="w"> </span><span class="s1">&#39;/&#39;</span><span class="o">:</span><span class="w"> </span><span class="nx">prepareRax</span><span class="p">(</span><span class="s1">&#39;div&#39;</span><span class="p">),</span> <span class="w"> </span><span class="s1">&#39;%&#39;</span><span class="o">:</span><span class="w"> </span><span class="nx">prepareRax</span><span class="p">(</span><span class="s1">&#39;div&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;RDX&#39;</span><span class="p">),</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Next we'll tackle <code>compileBegin</code> and <code>compileDefine</code>.</p> <h3 id="compilebegin">compileBegin</h3><p>A begin form is an expression made up of a series of expressions where all expression values are thrown away and the last expression value is the result of the begin form.</p> <p>To compile this form we will compile each expression passed in and pop from the stack to throw its value away. If the expression is the last in the list we will not pop since it is the result of the begin form.</p> <p>We will add one exception to this popping logic: if the begin is called from the top-level we will omit the popping.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">compileBegin</span><span class="p">(</span><span class="nx">body</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="nx">topLevel</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">false</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">body</span><span class="p">.</span><span class="nx">forEach</span><span class="p">((</span><span class="nx">expression</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">expression</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="nx">topLevel</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="nx">body</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`POP RAX # Ignore non-final expression`</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">});</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>That's it for <code>compileBegin</code>!</p> <h3 id="compiledefine">compileDefine</h3><p>The prelude for a function definition will add its name to scope, push the current frame pointer (<code>RBP</code>) onto the stack and store the current stack pointer (<code>RSP</code>) as the new frame pointer (<code>RBP</code>).</p> <p>Remember that we use the frame pointer as a point of reference when setting and getting local and parameter values. It works out entirely by convention.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">compileDefine</span><span class="p">([</span><span class="nx">name</span><span class="p">,</span><span class="w"> </span><span class="nx">params</span><span class="p">,</span><span class="w"> </span><span class="p">...</span><span class="nx">body</span><span class="p">],</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Add this function to outer scope</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">safe</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">assign</span><span class="p">(</span><span class="nx">name</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Copy outer scope so parameter mappings aren&#39;t exposed in outer scope.</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">childScope</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">copy</span><span class="p">();</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="sb">`</span><span class="si">${</span><span class="nx">safe</span><span class="si">}</span><span class="sb">:`</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`PUSH RBP`</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`MOV RBP, RSP\n`</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Next we copy the parameters into local scope at their negative (from the frame pointer) location. In the future we may decide to actually copy in the parameter <em>values</em> into the local stack but for now there's no benefit.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">compileDefine</span><span class="p">([</span><span class="nx">name</span><span class="p">,</span><span class="w"> </span><span class="nx">params</span><span class="p">,</span><span class="w"> </span><span class="p">...</span><span class="nx">body</span><span class="p">],</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Add this function to outer scope</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">safe</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">assign</span><span class="p">(</span><span class="nx">name</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Copy outer scope so parameter mappings aren&#39;t exposed in outer scope.</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">childScope</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">copy</span><span class="p">();</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="sb">`</span><span class="si">${</span><span class="nx">safe</span><span class="si">}</span><span class="sb">:`</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`PUSH RBP`</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`MOV RBP, RSP\n`</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Copy params into local scope</span> <span class="w"> </span><span class="nx">params</span><span class="p">.</span><span class="nx">forEach</span><span class="p">((</span><span class="nx">param</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">childScope</span><span class="p">.</span><span class="nx">map</span><span class="p">[</span><span class="nx">param</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">-</span><span class="mf">1</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="p">(</span><span class="nx">params</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mf">2</span><span class="p">);</span> <span class="w"> </span><span class="p">});</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Next we'll compile the body of the function within a <code>begin</code> block.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">compileDefine</span><span class="p">([</span><span class="nx">name</span><span class="p">,</span><span class="w"> </span><span class="nx">params</span><span class="p">,</span><span class="w"> </span><span class="p">...</span><span class="nx">body</span><span class="p">],</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Add this function to outer scope</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">safe</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">assign</span><span class="p">(</span><span class="nx">name</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Copy outer scope so parameter mappings aren&#39;t exposed in outer scope.</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">childScope</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">copy</span><span class="p">();</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="sb">`</span><span class="si">${</span><span class="nx">safe</span><span class="si">}</span><span class="sb">:`</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`PUSH RBP`</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`MOV RBP, RSP\n`</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Copy params into local scope</span> <span class="w"> </span><span class="nx">params</span><span class="p">.</span><span class="nx">forEach</span><span class="p">((</span><span class="nx">param</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">childScope</span><span class="p">.</span><span class="nx">map</span><span class="p">[</span><span class="nx">param</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="o">-</span><span class="mf">1</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="p">(</span><span class="nx">params</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mf">2</span><span class="p">);</span> <span class="w"> </span><span class="p">});</span> <span class="w"> </span><span class="c1">// Pass childScope in for reference when body is compiled.</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileBegin</span><span class="p">(</span><span class="nx">body</span><span class="p">,</span><span class="w"> </span><span class="nx">childScope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Then in the postlude we'll pop the stack (for the return result of the begin form), save it in RAX, pop the previous frame pointer back to restore the calling frame, and return.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="n">compileDefine</span><span class="p">(</span><span class="o">[</span><span class="n">name, params, ...body</span><span class="o">]</span><span class="p">,</span><span class="w"> </span><span class="k">scope</span><span class="p">,</span><span class="w"> </span><span class="k">depth</span><span class="p">)</span><span class="w"> </span><span class="err">{</span> <span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="k">Add</span><span class="w"> </span><span class="n">this</span><span class="w"> </span><span class="k">function</span><span class="w"> </span><span class="k">to</span><span class="w"> </span><span class="k">outer</span><span class="w"> </span><span class="k">scope</span> <span class="w"> </span><span class="n">const</span><span class="w"> </span><span class="n">safe</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">scope</span><span class="p">.</span><span class="n">assign</span><span class="p">(</span><span class="n">name</span><span class="p">);</span> <span class="w"> </span><span class="o">//</span><span class="w"> </span><span class="n">Copy</span><span class="w"> </span><span class="k">outer</span><span class="w"> </span><span class="k">scope</span><span class="w"> </span><span class="n">so</span><span class="w"> </span><span class="k">parameter</span><span class="w"> </span><span class="n">mappings</span><span class="w"> </span><span class="n">aren</span><span class="s1">&#39;t exposed in outer scope.</span> <span class="s1"> const childScope = scope.copy();</span> <span class="s1"> this.emit(0, `${safe}:`);</span> <span class="s1"> this.emit(depth, `PUSH RBP`);</span> <span class="s1"> this.emit(depth, `MOV RBP, RSP\n`);</span> <span class="s1"> // Copy params into local scope</span> <span class="s1"> params.forEach((param, i) =&gt; {</span> <span class="s1"> childScope.map[param] = -1 * (params.length - i - 1 + 2);</span> <span class="s1"> });</span> <span class="s1"> // Pass childScope in for reference when body is compiled.</span> <span class="s1"> this.compileBegin(body, childScope, depth);</span> <span class="s1"> // Save the return value</span> <span class="s1"> this.emit(0, &#39;&#39;);</span> <span class="s1"> this.emit(depth, `POP RAX`);</span> <span class="s1"> this.emit(depth, `POP RBP\n`);</span> <span class="s1"> this.emit(depth, &#39;</span><span class="n">RET</span><span class="err">\</span><span class="n">n</span><span class="err">&#39;</span><span class="p">);</span> <span class="w"> </span><span class="err">}</span> </pre></div> <p>And now we're ready to compile a simple program!</p> <h3 id="our-first-program">Our first program</h3><p>Here's a simple one we can support:</p> <div class="highlight"><pre><span></span><span class="nv">$</span><span class="w"> </span><span class="nv">cat</span><span class="w"> </span><span class="nv">tests/meaning-of-life.lisp</span> <span class="p">(</span><span class="nv">def</span><span class="w"> </span><span class="nv">main</span><span class="w"> </span><span class="p">()</span> <span class="w"> </span><span class="p">(</span><span class="nb">+</span><span class="w"> </span><span class="mi">8</span><span class="w"> </span><span class="p">(</span><span class="nb">*</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="mi">17</span><span class="p">)))</span> </pre></div> <p>We'll compile this program without the ulisp kernel (which contains a lisp library we cannot currently compile):</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>node<span class="w"> </span>src/ulisp.js<span class="w"> </span>tests/meaning-of-life.lisp<span class="w"> </span>--no-kernel<span class="w"> </span>--backend<span class="w"> </span>x86 $<span class="w"> </span>./build/prog $<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="nv">$?</span> <span class="m">42</span> </pre></div> <p>Not bad. Let's finish up with support for <code>prepareLogicalWrappers</code>, <code>prepareSyscallWrappers</code>, and <code>compileIf</code>.</p> <h3 id="preparelogicalwrappers">prepareLogicalWrappers</h3><p>Storing logical results as values is a bit of pain. Most of the internet wants you to use branching. And a better compiler may optimize an idiom like <code>(if (> 5 2) ...)</code> into a single branch.</p> <p>But we're going to resort to an instruction I just learned about called <code>CMOV</code>. This allows us to conditionally assign a value based on flags, similar to how you can conditionally branch.</p> <p>Otherwise we'll follow a pattern similar to our arithmetic wrappers. At the end of the procedure we will have a 0 or a 1 on the top of the stack.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">prepareLogicalWrappers</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">prepareComparison</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="nx">operator</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">[</span><span class="nx">operator</span><span class="p">]</span><span class="o">:</span><span class="w"> </span><span class="p">(</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">depth</span><span class="o">++</span><span class="p">;</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`# </span><span class="si">${</span><span class="nx">operator</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Compile first argument, store in RAX</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">arg</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Compile second argument</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">arg</span><span class="p">[</span><span class="mf">1</span><span class="p">],</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`POP RAX`</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Compile operation</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`CMP [RSP], RAX`</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Reset RAX to serve as CMOV* dest, MOV to keep flags (vs. XOR)</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`MOV RAX, 0`</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Conditional set [RSP]</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">operation</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="s1">&#39;&gt;&#39;</span><span class="o">:</span><span class="w"> </span><span class="s1">&#39;CMOVA&#39;</span><span class="p">,</span> <span class="w"> </span><span class="s1">&#39;&gt;=&#39;</span><span class="o">:</span><span class="w"> </span><span class="s1">&#39;CMOVAE&#39;</span><span class="p">,</span> <span class="w"> </span><span class="s1">&#39;&lt;&#39;</span><span class="o">:</span><span class="w"> </span><span class="s1">&#39;CMOVB&#39;</span><span class="p">,</span> <span class="w"> </span><span class="s1">&#39;&lt;=&#39;</span><span class="o">:</span><span class="w"> </span><span class="s1">&#39;CMOVBE&#39;</span><span class="p">,</span> <span class="w"> </span><span class="s1">&#39;==&#39;</span><span class="o">:</span><span class="w"> </span><span class="s1">&#39;CMOVE&#39;</span><span class="p">,</span> <span class="w"> </span><span class="s1">&#39;!=&#39;</span><span class="o">:</span><span class="w"> </span><span class="s1">&#39;CMOVNE&#39;</span><span class="p">,</span> <span class="w"> </span><span class="p">}[</span><span class="nx">operator</span><span class="p">];</span> <span class="w"> </span><span class="c1">// CMOV* requires the source to be memory or register</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`MOV DWORD PTR [RSP], 1`</span><span class="p">);</span> <span class="w"> </span><span class="c1">// CMOV* requires the dest to be a register</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`</span><span class="si">${</span><span class="nx">operation</span><span class="si">}</span><span class="sb"> RAX, [RSP]`</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`MOV [RSP], RAX`</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`# End </span><span class="si">${</span><span class="nx">operator</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">...</span><span class="nx">prepareComparison</span><span class="p">(</span><span class="s1">&#39;&gt;&#39;</span><span class="p">),</span> <span class="w"> </span><span class="p">...</span><span class="nx">prepareComparison</span><span class="p">(</span><span class="s1">&#39;&gt;=&#39;</span><span class="p">),</span> <span class="w"> </span><span class="p">...</span><span class="nx">prepareComparison</span><span class="p">(</span><span class="s1">&#39;&lt;&#39;</span><span class="p">),</span> <span class="w"> </span><span class="p">...</span><span class="nx">prepareComparison</span><span class="p">(</span><span class="s1">&#39;&lt;=&#39;</span><span class="p">),</span> <span class="w"> </span><span class="p">...</span><span class="nx">prepareComparison</span><span class="p">(</span><span class="s1">&#39;==&#39;</span><span class="p">),</span> <span class="w"> </span><span class="p">...</span><span class="nx">prepareComparison</span><span class="p">(</span><span class="s1">&#39;!=&#39;</span><span class="p">),</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> </pre></div> <h3 id="preparesyscallwrappers">prepareSyscallWrappers</h3><p>This helper is similar to <code>compileCall</code> except for that it needs to follow the SYS V ABI and use the <code>SYSCALL</code> instruction rather than <code>CALL</code>.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">prepareSyscallWrappers</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">registers</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span><span class="s1">&#39;RDI&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;RSI&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;RDX&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;R10&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;R8&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;R9&#39;</span><span class="p">];</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">wrappers</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{};</span> <span class="w"> </span><span class="nb">Object</span><span class="p">.</span><span class="nx">keys</span><span class="p">(</span><span class="nx">SYSCALL_MAP</span><span class="p">).</span><span class="nx">forEach</span><span class="p">((</span><span class="nx">key</span><span class="p">,</span><span class="w"> </span><span class="nx">obj</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">wrappers</span><span class="p">[</span><span class="sb">`syscall/</span><span class="si">${</span><span class="nx">key</span><span class="si">}</span><span class="sb">`</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="nx">args</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">args</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">&gt;</span><span class="w"> </span><span class="nx">registers</span><span class="p">.</span><span class="nx">length</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="sb">`Too many arguments to syscall/</span><span class="si">${</span><span class="nx">key</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Compile first</span> <span class="w"> </span><span class="nx">args</span><span class="p">.</span><span class="nx">forEach</span><span class="p">((</span><span class="nx">arg</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">));</span> <span class="w"> </span><span class="c1">// Then pop to avoid possible register contention</span> <span class="w"> </span><span class="nx">args</span><span class="p">.</span><span class="nx">forEach</span><span class="p">((</span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`POP </span><span class="si">${</span><span class="nx">registers</span><span class="p">[</span><span class="nx">args</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="p">]</span><span class="si">}</span><span class="sb">`</span><span class="p">),</span> <span class="w"> </span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`MOV RAX, </span><span class="si">${</span><span class="nx">SYSCALL_MAP</span><span class="p">[</span><span class="nx">key</span><span class="p">]</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;SYSCALL&#39;</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`PUSH RAX`</span><span class="p">);</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">});</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">wrappers</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>And we're set! Last up is <code>compileIf</code>.</p> <h3 id="compileif">compileIf</h3><p>This is standard code generation but gets a little tricky due to our stack commitments. Testing must pop the test value off the stack. And then/else blocks must <em>push</em> a value onto the stack (even if there is no else block).</p> <p>Here is an example we'd like to support:</p> <div class="highlight"><pre><span></span><span class="p">(</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nv">foo</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="nv">do-bar</span><span class="p">))</span> </pre></div> <p>We compile the test and branch:</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">compileIf</span><span class="p">([</span><span class="nx">test</span><span class="p">,</span><span class="w"> </span><span class="nx">then</span><span class="p">,</span><span class="w"> </span><span class="nx">els</span><span class="p">],</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;# If&#39;</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Compile test</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">test</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">branch</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="sb">`else_branch`</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">GLOBAL_COUNTER</span><span class="o">++</span><span class="p">;</span> <span class="w"> </span><span class="c1">// Must pop/use up argument in test</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;&#39;</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`POP RAX`</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`TEST RAX, RAX`</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`JZ .</span><span class="si">${</span><span class="nx">branch</span><span class="si">}</span><span class="sb">\n`</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Then we compile the then block and jump to after the else block afterward.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">compileIf</span><span class="p">([</span><span class="nx">test</span><span class="p">,</span><span class="w"> </span><span class="nx">then</span><span class="p">,</span><span class="w"> </span><span class="nx">els</span><span class="p">],</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;# If&#39;</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Compile test</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">test</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">branch</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="sb">`else_branch`</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">GLOBAL_COUNTER</span><span class="o">++</span><span class="p">;</span> <span class="w"> </span><span class="c1">// Must pop/use up argument in test</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;&#39;</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`POP RAX`</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`TEST RAX, RAX`</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`JZ .</span><span class="si">${</span><span class="nx">branch</span><span class="si">}</span><span class="sb">\n`</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Compile then section</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`# If then`</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">then</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`JMP .after_</span><span class="si">${</span><span class="nx">branch</span><span class="si">}</span><span class="sb">\n`</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>Finally we compile the else block if it exists, and otherwise we push a zero onto the stack (possibly to represent null).</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nx">compileIf</span><span class="p">([</span><span class="nx">test</span><span class="p">,</span><span class="w"> </span><span class="nx">then</span><span class="p">,</span><span class="w"> </span><span class="nx">els</span><span class="p">],</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;# If&#39;</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Compile test</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">test</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">branch</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="sb">`else_branch`</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">GLOBAL_COUNTER</span><span class="o">++</span><span class="p">;</span> <span class="w"> </span><span class="c1">// Must pop/use up argument in test</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;&#39;</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`POP RAX`</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`TEST RAX, RAX`</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`JZ .</span><span class="si">${</span><span class="nx">branch</span><span class="si">}</span><span class="sb">\n`</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Compile then section</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`# If then`</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">then</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`JMP .after_</span><span class="si">${</span><span class="nx">branch</span><span class="si">}</span><span class="sb">\n`</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Compile else section</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="sb">`# If else`</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="sb">`.</span><span class="si">${</span><span class="nx">branch</span><span class="si">}</span><span class="sb">:`</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">els</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">els</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">,</span><span class="w"> </span><span class="nx">depth</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;PUSH 0 # Null else branch&#39;</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="sb">`.after_</span><span class="si">${</span><span class="nx">branch</span><span class="si">}</span><span class="sb">:`</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;# End if&#39;</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>And we're ready for an interesting program! Let's print (to stdout) the result of <code>fib(20)</code>.</p> <h3 id="fibonacci">Fibonacci</h3><div class="highlight"><pre><span></span><span class="nv">$</span><span class="w"> </span><span class="nv">cat</span><span class="w"> </span><span class="o">.</span><span class="nv">/tests/fib.lisp</span> <span class="p">(</span><span class="nv">def</span><span class="w"> </span><span class="nv">fib</span><span class="w"> </span><span class="p">(</span><span class="nv">n</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nb">&lt;</span><span class="w"> </span><span class="nv">n</span><span class="w"> </span><span class="mi">2</span><span class="p">)</span> <span class="w"> </span><span class="nv">n</span> <span class="w"> </span><span class="p">(</span><span class="nb">+</span><span class="w"> </span><span class="p">(</span><span class="nv">fib</span><span class="w"> </span><span class="p">(</span><span class="nb">-</span><span class="w"> </span><span class="nv">n</span><span class="w"> </span><span class="mi">1</span><span class="p">))</span><span class="w"> </span><span class="p">(</span><span class="nv">fib</span><span class="w"> </span><span class="p">(</span><span class="nb">-</span><span class="w"> </span><span class="nv">n</span><span class="w"> </span><span class="mi">2</span><span class="p">)))))</span> <span class="p">(</span><span class="nv">def</span><span class="w"> </span><span class="nv">main</span><span class="w"> </span><span class="p">()</span> <span class="w"> </span><span class="p">(</span><span class="nb">print</span><span class="w"> </span><span class="p">(</span><span class="nv">fib</span><span class="w"> </span><span class="mi">20</span><span class="p">)))</span> </pre></div> <p>And check out the kernel:</p> <div class="highlight"><pre><span></span><span class="nv">$</span><span class="w"> </span><span class="nv">cat</span><span class="w"> </span><span class="o">.</span><span class="nv">/lib/kernel.lisp</span> <span class="p">(</span><span class="nv">def</span><span class="w"> </span><span class="nv">print-char</span><span class="w"> </span><span class="p">(</span><span class="nv">c</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="nv">syscall/write</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="nv">&amp;c</span><span class="w"> </span><span class="mi">1</span><span class="p">))</span> <span class="p">(</span><span class="nv">def</span><span class="w"> </span><span class="nb">print</span><span class="w"> </span><span class="p">(</span><span class="nv">n</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nb">&gt;</span><span class="w"> </span><span class="nv">n</span><span class="w"> </span><span class="mi">9</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="nb">print</span><span class="w"> </span><span class="p">(</span><span class="nb">/</span><span class="w"> </span><span class="nv">n</span><span class="w"> </span><span class="mi">10</span><span class="p">)))</span> <span class="w"> </span><span class="p">(</span><span class="nv">print-char</span><span class="w"> </span><span class="p">(</span><span class="nb">+</span><span class="w"> </span><span class="mi">48</span><span class="w"> </span><span class="p">(</span><span class="nv">%</span><span class="w"> </span><span class="nv">n</span><span class="w"> </span><span class="mi">10</span><span class="p">))))</span> </pre></div> <p>Compile and run it:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>node<span class="w"> </span>src/ulisp.js<span class="w"> </span>tests/fib.lisp<span class="w"> </span>--backend<span class="w"> </span>x86 $<span class="w"> </span>./build/prog <span class="m">6765</span> </pre></div> <p>And we're in business!</p> <p><blockquote class="twitter-tweet" data-conversation="none"><p lang="en" dir="ltr">Latest post in the compiler basics series: an x86 upgrade. We&#39;ve got basic syscall support, &quot;infinite&quot; locals and parameters, and if/else. More than enough to handle printing integers to stdout and recursive fibonacci. <a href="https://t.co/B3OV0vEX1V">https://t.co/B3OV0vEX1V</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1203816831456284677?ref_src=twsrc%5Etfw">December 8, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/compiler-basics-an-x86-upgrade.htmlSun, 08 Dec 2019 00:00:00 +0000Confusion and disengagement in meetingshttp://notes.eatonphil.com/confusion-disengagement-in-meetings.html<p>The quickest way to cut through confusion or disagreement among otherwise amiable and honest folks is to ask questions.</p> <p>Ask early so you don't waste time. But it's not enough to just ask clarifying questions because the <strong>answers</strong> won't always be clear.</p> <p>Sounds like Human Interaction 101, and maybe it is. These techniques show up more when discussing <strong>outcomes</strong> and very rarely when discussing <strong>assumptions</strong>.</p> <p>Meetings are called to discuss outcomes, not assumptions. But assumptions almost always need to be called into question too.</p> <p>If you have clarity personally but you observe confusion and disengagement, <strong>ask questions and summarize</strong>. Someone must be aware of the group and be willing to sound dumb.</p> <p>If you aren't aware of confusion or disengagement, start paying attention. Addressing doesn't need to be hard and is personally meaningful.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Address confusion and disengagement in meetings by asking questions and summarizing, whether you&#39;re confused or not. Question outcomes _and_ assumptions. <a href="https://t.co/2OPifEBSq5">https://t.co/2OPifEBSq5</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1200972237756674049?ref_src=twsrc%5Etfw">December 1, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/confusion-disengagement-in-meetings.htmlSat, 30 Nov 2019 00:00:00 +0000Interpreting Gohttp://notes.eatonphil.com/interpreting-go.html<p>After spending some time at work on tooling for keeping documentation in sync with Go struct definitions I had enough exposure to Go's built-in parsing package that next steps were clear: write an interpreter. <a href="http://notes.eatonphil.com/interpreting-typescript.html">It's a great way to get more comfortable with a language's AST.</a></p> <p>In this post we'll use the Go parser package to interpret the AST directly (as opposed to compiling to a bytecode VM) with enough to support a recursive implementation of the fibonacci algorithm:</p> <div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span> <span class="kd">func</span><span class="w"> </span><span class="nx">fib</span><span class="p">(</span><span class="nx">a</span><span class="w"> </span><span class="kt">int</span><span class="p">)</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">a</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">a</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">1</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fib</span><span class="p">(</span><span class="nx">a</span><span class="o">-</span><span class="mi">1</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">fib</span><span class="p">(</span><span class="nx">a</span><span class="o">-</span><span class="mi">2</span><span class="p">)</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">println</span><span class="p">(</span><span class="nx">fib</span><span class="p">(</span><span class="mi">15</span><span class="p">))</span> <span class="p">}</span> </pre></div> <p class="note"> You'll note this isn't actually valid Go because we are using an undefined function <code>println</code>. We'll provide that for the runtime to make things easier on ourselves. </p><p>The fibonacci algorithm is my goto minimal program that forces us to deal with basic aspects of:</p> <ul> <li>Function definitions</li> <li>Function calls</li> <li>Function arguments</li> <li>Function return values</li> <li>If/else</li> <li>Assignment</li> <li>Arithmetic and boolean operators</li> </ul> <p>We'll do this in around 200 LoC. Project code is available on <a href="https://github.com/eatonphil/goi">Github</a>.</p> <p>A follow-up post will cover support for an iterative fibonacci implementation with support for basic aspects of:</p> <ul> <li>Local variables</li> <li>Loops</li> </ul> <h3 id="first-steps">First steps</h3><p>I always start exploring an AST by practicing error-driven development. It's helpful to have the Go <a href="https://golang.org/pkg/go/ast/">AST</a>, <a href="https://golang.org/pkg/go/parser/">parser</a>, and <a href="https://golang.org/pkg/go/token/">token</a> package docs handy as well.</p> <p>We'll focus on single-file programs and start with <a href="https://golang.org/pkg/go/parser/#ParseFile">parser.ParseFile</a>. This function will return an <a href="https://golang.org/pkg/go/ast/#File">*ast.File</a>. This in turn contains a list of <a href="https://golang.org/pkg/go/ast/#Decl">Decl</a>s. Unfortunately Go stops being helpful at this point because we have no clue what is going to implement this <code>Decl</code> interface. So we'll switch on the concrete type and error until we know what we need to know.</p> <div class="highlight"><pre><span></span><span class="kn">package</span><span class="w"> </span><span class="nx">main</span> <span class="kn">import</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="s">&quot;go/ast&quot;</span> <span class="w"> </span><span class="s">&quot;go/parser&quot;</span> <span class="w"> </span><span class="s">&quot;go/token&quot;</span> <span class="w"> </span><span class="s">&quot;io/ioutil&quot;</span> <span class="w"> </span><span class="s">&quot;log&quot;</span> <span class="w"> </span><span class="s">&quot;os&quot;</span> <span class="w"> </span><span class="s">&quot;reflect&quot;</span> <span class="p">)</span> <span class="kd">func</span><span class="w"> </span><span class="nx">interpret</span><span class="p">(</span><span class="nx">f</span><span class="w"> </span><span class="o">*</span><span class="nx">ast</span><span class="p">.</span><span class="nx">File</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">decl</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">f</span><span class="p">.</span><span class="nx">Decls</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">d</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">decl</span><span class="p">.(</span><span class="kd">type</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">default</span><span class="p">:</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatalf</span><span class="p">(</span><span class="s">&quot;Unknown decl type (%s): %+v&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">reflect</span><span class="p">.</span><span class="nx">TypeOf</span><span class="p">(</span><span class="nx">d</span><span class="p">),</span><span class="w"> </span><span class="nx">d</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">fset</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">token</span><span class="p">.</span><span class="nx">NewFileSet</span><span class="p">()</span><span class="w"> </span><span class="c1">// positions are relative to fset</span> <span class="w"> </span><span class="nx">src</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">ioutil</span><span class="p">.</span><span class="nx">ReadFile</span><span class="p">(</span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatalf</span><span class="p">(</span><span class="s">&quot;Unable to read file: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">.</span><span class="nx">Error</span><span class="p">())</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">f</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parser</span><span class="p">.</span><span class="nx">ParseFile</span><span class="p">(</span><span class="nx">fset</span><span class="p">,</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="w"> </span><span class="nx">src</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatalf</span><span class="p">(</span><span class="s">&quot;Unable to parse file: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">.</span><span class="nx">Error</span><span class="p">())</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">interpret</span><span class="p">(</span><span class="nx">f</span><span class="p">)</span> <span class="p">}</span> </pre></div> <p>Build and run:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>build<span class="w"> </span>goi.go $<span class="w"> </span>./goi<span class="w"> </span>fib.go <span class="m">2019</span>/10/12<span class="w"> </span><span class="m">09</span>:43:48<span class="w"> </span>Unknown<span class="w"> </span>decl<span class="w"> </span><span class="nb">type</span><span class="w"> </span><span class="o">(</span>*ast.FuncDecl<span class="o">)</span>:<span class="w"> </span><span class="p">&amp;</span><span class="o">{</span>Doc:&lt;nil&gt;<span class="w"> </span>Recv:&lt;nil&gt;<span class="w"> </span>Name:fib<span class="w"> </span>Type:0xc000096320<span class="w"> </span>Body:0xc00009a3c0<span class="o">}</span> </pre></div> <p>Cool! This is the declaration of the <code>fib</code> function and its type is <a href="https://golang.org/pkg/go/ast/#FuncDecl">*ast.FuncDecl</a>.</p> <h3 id="interpreting-ast.funcdecl">Interpreting ast.FuncDecl</h3><p>A function declaration is going to need to add its name to a context map, mapped to a function reference for use in function calls. Since Go throws everything into the same context namespace this we can simply pass around a map of strings to <code>value</code>s where a <code>value</code> can be any Go value. To facilitate this, we'll define a <code>value</code> struct to hold an integer to represent "kind" and an empty interface "value". When a value is referenced it will have to switch on the "kind" and then cast the "value".</p> <p>Additionally, and unlike a value-oriented language like Scheme, we'll need to track a set of <code>return</code> values at all stages through interpretation so, when set, we can short circuit execution.</p> <div class="highlight"><pre><span></span><span class="kd">type</span><span class="w"> </span><span class="nx">kind</span><span class="w"> </span><span class="kt">uint</span> <span class="kd">const</span><span class="w"> </span><span class="p">(</span> <span class="w"> </span><span class="nx">i64</span><span class="w"> </span><span class="nx">kind</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">iota</span> <span class="w"> </span><span class="nx">fn</span> <span class="w"> </span><span class="nx">bl</span> <span class="p">)</span> <span class="kd">type</span><span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">kind</span><span class="w"> </span><span class="nx">kind</span> <span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="kd">interface</span><span class="p">{}</span> <span class="p">}</span> <span class="kd">type</span><span class="w"> </span><span class="nx">context</span><span class="w"> </span><span class="kd">map</span><span class="p">[</span><span class="kt">string</span><span class="p">]</span><span class="nx">value</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">c</span><span class="w"> </span><span class="nx">context</span><span class="p">)</span><span class="w"> </span><span class="nb">copy</span><span class="p">()</span><span class="w"> </span><span class="nx">context</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">cpy</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">context</span><span class="p">{}</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">key</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">cpy</span><span class="p">[</span><span class="nx">key</span><span class="p">]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">value</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">cpy</span> <span class="p">}</span> <span class="kd">type</span><span class="w"> </span><span class="nx">ret</span><span class="w"> </span><span class="kd">struct</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">set</span><span class="w"> </span><span class="kt">bool</span> <span class="w"> </span><span class="nx">vs</span><span class="w"> </span><span class="p">[]</span><span class="nx">value</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">)</span><span class="w"> </span><span class="nx">setValues</span><span class="p">(</span><span class="nx">vs</span><span class="w"> </span><span class="p">[]</span><span class="nx">value</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">vs</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">vs</span> <span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">set</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">true</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">interpretFuncDecl</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">,</span><span class="w"> </span><span class="nx">fd</span><span class="w"> </span><span class="o">*</span><span class="nx">ast</span><span class="p">.</span><span class="nx">FuncDecl</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">ctx</span><span class="p">[</span><span class="nx">fd</span><span class="p">.</span><span class="nx">Name</span><span class="p">.</span><span class="nx">String</span><span class="p">()]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">value</span><span class="p">{</span> <span class="w"> </span><span class="nx">fn</span><span class="p">,</span> <span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="w"> </span><span class="p">[]</span><span class="nx">value</span><span class="p">)</span><span class="w"> </span><span class="p">{},</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">interpret</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="nx">f</span><span class="w"> </span><span class="o">*</span><span class="nx">ast</span><span class="p">.</span><span class="nx">File</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">decl</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">f</span><span class="p">.</span><span class="nx">Decls</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">d</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">decl</span><span class="p">.(</span><span class="kd">type</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="o">*</span><span class="nx">ast</span><span class="p">.</span><span class="nx">FuncDecl</span><span class="p">:</span> <span class="w"> </span><span class="nx">interpretFuncDecl</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">d</span><span class="p">)</span> <span class="w"> </span><span class="k">default</span><span class="p">:</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatalf</span><span class="p">(</span><span class="s">&quot;Unknown decl type (%s): %+v&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">reflect</span><span class="p">.</span><span class="nx">TypeOf</span><span class="p">(</span><span class="nx">d</span><span class="p">),</span><span class="w"> </span><span class="nx">d</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>Now that we have the idea of return management and contexts set out, let's fill out the actual function declaration callback. Inside we'll need to copy the context so variables declared inside the function are not visible outside. Then we'll iterate over the parameters and map them in context to the associated argument. Finally we'll interpret the body.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">interpretBlockStmt</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">,</span><span class="w"> </span><span class="nx">fd</span><span class="w"> </span><span class="o">*</span><span class="nx">ast</span><span class="p">.</span><span class="nx">BlockStmt</span><span class="p">)</span><span class="w"> </span><span class="p">{}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">interpretFuncDecl</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">,</span><span class="w"> </span><span class="nx">fd</span><span class="w"> </span><span class="o">*</span><span class="nx">ast</span><span class="p">.</span><span class="nx">FuncDecl</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">ctx</span><span class="p">[</span><span class="nx">fd</span><span class="p">.</span><span class="nx">Name</span><span class="p">.</span><span class="nx">String</span><span class="p">()]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">value</span><span class="p">{</span> <span class="w"> </span><span class="nx">fn</span><span class="p">,</span> <span class="w"> </span><span class="kd">func</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="w"> </span><span class="p">[]</span><span class="nx">value</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">childCtx</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">ctx</span><span class="p">.</span><span class="nb">copy</span><span class="p">()</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">param</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">fd</span><span class="p">.</span><span class="nx">Type</span><span class="p">.</span><span class="nx">Params</span><span class="p">.</span><span class="nx">List</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">childCtx</span><span class="p">[</span><span class="nx">param</span><span class="p">.</span><span class="nx">Names</span><span class="p">[</span><span class="mi">0</span><span class="p">].</span><span class="nx">String</span><span class="p">()]</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nx">args</span><span class="p">[</span><span class="nx">i</span><span class="p">]</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">interpretBlockStmt</span><span class="p">(</span><span class="nx">childCtx</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">fd</span><span class="p">.</span><span class="nx">Body</span><span class="p">)</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>And we'll add a call to the interpreted <code>main</code> to the end of the interpreter's <code>main</code>:</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">fset</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">token</span><span class="p">.</span><span class="nx">NewFileSet</span><span class="p">()</span><span class="w"> </span><span class="c1">// positions are relative to fset</span> <span class="w"> </span><span class="nx">src</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">ioutil</span><span class="p">.</span><span class="nx">ReadFile</span><span class="p">(</span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatalf</span><span class="p">(</span><span class="s">&quot;Unable to read file: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">.</span><span class="nx">Error</span><span class="p">())</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">f</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">parser</span><span class="p">.</span><span class="nx">ParseFile</span><span class="p">(</span><span class="nx">fset</span><span class="p">,</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">Args</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="w"> </span><span class="nx">src</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">err</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatalf</span><span class="p">(</span><span class="s">&quot;Unable to parse file: %s&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">err</span><span class="p">.</span><span class="nx">Error</span><span class="p">())</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">ctx</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">context</span><span class="p">{}</span> <span class="w"> </span><span class="nx">interpret</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="nx">f</span><span class="p">)</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="nx">ret</span> <span class="w"> </span><span class="nx">ctx</span><span class="p">[</span><span class="s">&quot;main&quot;</span><span class="p">].</span><span class="nx">value</span><span class="p">.(</span><span class="kd">func</span><span class="p">(</span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="nx">value</span><span class="p">))(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="nx">value</span><span class="p">{})</span> <span class="p">}</span> </pre></div> <p>Next step!</p> <h3 id="interpreting-ast.blockstmt">Interpreting ast.BlockStmt</h3><p>For this AST node, we'll iterate over each statement and interpret it. If the return value has been set we'll execute the loop to short circuit execution.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">interpretStmt</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">,</span><span class="w"> </span><span class="nx">stmt</span><span class="w"> </span><span class="nx">ast</span><span class="p">.</span><span class="nx">Stmt</span><span class="p">)</span><span class="w"> </span><span class="p">{}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">interpretBlockStmt</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">,</span><span class="w"> </span><span class="nx">bs</span><span class="w"> </span><span class="o">*</span><span class="nx">ast</span><span class="p">.</span><span class="nx">BlockStmt</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">stmt</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">bs</span><span class="p">.</span><span class="nx">List</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">interpretStmt</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">stmt</span><span class="p">)</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">set</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>Next step!</p> <h3 id="interpreting-ast.stmt">Interpreting ast.Stmt</h3><p>Since <a href="https://golang.org/pkg/go/ast/#Stmt">ast.Stmt</a> is another interface, we're back to error-driven development.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">interpretStmt</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">,</span><span class="w"> </span><span class="nx">stmt</span><span class="w"> </span><span class="nx">ast</span><span class="p">.</span><span class="nx">Stmt</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">stmt</span><span class="p">.(</span><span class="kd">type</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">default</span><span class="p">:</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatalf</span><span class="p">(</span><span class="s">&quot;Unknown stmt type (%s): %+v&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">reflect</span><span class="p">.</span><span class="nx">TypeOf</span><span class="p">(</span><span class="nx">s</span><span class="p">),</span><span class="w"> </span><span class="nx">s</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>And the trigger:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>build<span class="w"> </span>goi.go $<span class="w"> </span>./goi<span class="w"> </span>fib.go <span class="m">2019</span>/10/12<span class="w"> </span><span class="m">10</span>:15:14<span class="w"> </span>Unknown<span class="w"> </span>stmt<span class="w"> </span><span class="nb">type</span><span class="w"> </span><span class="o">(</span>*ast.ExprStmt<span class="o">)</span>:<span class="w"> </span><span class="p">&amp;</span><span class="o">{</span>X:0xc0000a02c0<span class="o">}</span> </pre></div> <p>Great! Checking the docs on <a href="https://golang.org/pkg/go/ast/#ExprStmt">ast.ExprStmt</a> we'll just skip directly to a call to a new function <code>interpretExpr</code>:</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">interpretExpr</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">,</span><span class="w"> </span><span class="nx">expr</span><span class="w"> </span><span class="nx">ast</span><span class="p">.</span><span class="nx">Expr</span><span class="p">)</span><span class="w"> </span><span class="p">{}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">interpretStmt</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">,</span><span class="w"> </span><span class="nx">stmt</span><span class="w"> </span><span class="nx">ast</span><span class="p">.</span><span class="nx">Stmt</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">stmt</span><span class="p">.(</span><span class="kd">type</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="o">*</span><span class="nx">ast</span><span class="p">.</span><span class="nx">ExprStmt</span><span class="p">:</span> <span class="w"> </span><span class="nx">interpretExpr</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">X</span><span class="p">)</span> <span class="w"> </span><span class="k">default</span><span class="p">:</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatalf</span><span class="p">(</span><span class="s">&quot;Unknown stmt type (%s): %+v&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">reflect</span><span class="p">.</span><span class="nx">TypeOf</span><span class="p">(</span><span class="nx">s</span><span class="p">),</span><span class="w"> </span><span class="nx">s</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>Moving on!</p> <h3 id="interpreting-ast.expr">Interpreting ast.Expr</h3><p>We've got another <a href="https://golang.org/pkg/go/ast/#Expr">interface</a>. Let's error!</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">interpretExpr</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">,</span><span class="w"> </span><span class="nx">expr</span><span class="w"> </span><span class="nx">ast</span><span class="p">.</span><span class="nx">Expr</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">e</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">expr</span><span class="p">.(</span><span class="kd">type</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">default</span><span class="p">:</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatalf</span><span class="p">(</span><span class="s">&quot;Unknown expr type (%s): %+v&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">reflect</span><span class="p">.</span><span class="nx">TypeOf</span><span class="p">(</span><span class="nx">e</span><span class="p">),</span><span class="w"> </span><span class="nx">e</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>And the trigger:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>build<span class="w"> </span>goi.go $<span class="w"> </span>./goi<span class="w"> </span>fib.go <span class="m">2019</span>/10/12<span class="w"> </span><span class="m">10</span>:19:16<span class="w"> </span>Unknown<span class="w"> </span>expr<span class="w"> </span><span class="nb">type</span><span class="w"> </span><span class="o">(</span>*ast.CallExpr<span class="o">)</span>:<span class="w"> </span><span class="p">&amp;</span><span class="o">{</span>Fun:println<span class="w"> </span>Lparen:146<span class="w"> </span>Args:<span class="o">[</span>0xc0000a2280<span class="o">]</span><span class="w"> </span>Ellipsis:0<span class="w"> </span>Rparen:154<span class="o">}</span> </pre></div> <p>Cool! For a call we'll evaluate the arguments, evaluate the function expression itself, cast the resulting value to a function, and call it.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">interpretCallExpr</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">,</span><span class="w"> </span><span class="nx">ce</span><span class="w"> </span><span class="o">*</span><span class="nx">ast</span><span class="p">.</span><span class="nx">CallExpr</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">fnr</span><span class="w"> </span><span class="nx">ret</span> <span class="w"> </span><span class="nx">interpretExpr</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">fnr</span><span class="p">,</span><span class="w"> </span><span class="nx">ce</span><span class="p">.</span><span class="nx">Fun</span><span class="p">)</span> <span class="w"> </span><span class="nx">fn</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">fnr</span><span class="p">.</span><span class="nx">values</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="w"> </span><span class="nx">values</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="p">[]</span><span class="nx">value</span><span class="p">{}</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">arg</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">ce</span><span class="p">.</span><span class="nx">Args</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">vr</span><span class="w"> </span><span class="nx">ret</span> <span class="w"> </span><span class="nx">interpretExpr</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">vr</span><span class="p">,</span><span class="w"> </span><span class="nx">arg</span><span class="p">)</span> <span class="w"> </span><span class="nx">values</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">values</span><span class="p">,</span><span class="w"> </span><span class="nx">vr</span><span class="p">.</span><span class="nx">values</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">fn</span><span class="p">.</span><span class="nx">value</span><span class="p">.(</span><span class="kd">func</span><span class="p">(</span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="nx">value</span><span class="p">))(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">values</span><span class="p">)</span> <span class="p">}</span> </pre></div> <p class="note"> All of this casting is unsafe because we aren't doing a type-checking stage. But we can ignore this because if a type-checking stage were introduced (which it need to be at some point), it would prevent bad casts from happening. </p><h3 id="handling-more-ast.expr-implementations">Handling more ast.Expr implementations</h3><p>Let's give the interpreter a shot again:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>build<span class="w"> </span>goi.go $<span class="w"> </span>./goi<span class="w"> </span>fib.go <span class="m">2019</span>/10/12<span class="w"> </span><span class="m">10</span>:28:17<span class="w"> </span>Unknown<span class="w"> </span>expr<span class="w"> </span><span class="nb">type</span><span class="w"> </span><span class="o">(</span>*ast.Ident<span class="o">)</span>:<span class="w"> </span>println </pre></div> <p>We'll need to add <a href="https://golang.org/pkg/go/ast/#Ident">ast.Ident</a> support to <code>interpretCallExpr</code>. This will be a simple lookup in context. We'll also add a <code>setValue</code> helper since the <code>ret</code> value is serving double-duty as a value passing mechanism and also a function's return value (solely where multiple value are a thing).</p> <div class="highlight"><pre><span></span><span class="o">...</span> <span class="kd">func</span><span class="w"> </span><span class="p">(</span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">)</span><span class="w"> </span><span class="nx">setValue</span><span class="p">(</span><span class="nx">v</span><span class="w"> </span><span class="nx">value</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">values</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="p">[]</span><span class="nx">value</span><span class="p">{</span><span class="nx">v</span><span class="p">}</span> <span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">set</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">true</span> <span class="p">}</span> <span class="o">...</span> <span class="kd">func</span><span class="w"> </span><span class="nx">interpretExpr</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">,</span><span class="w"> </span><span class="nx">expr</span><span class="w"> </span><span class="nx">ast</span><span class="p">.</span><span class="nx">Expr</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">e</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">expr</span><span class="p">.(</span><span class="kd">type</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="o">*</span><span class="nx">ast</span><span class="p">.</span><span class="nx">CallExpr</span><span class="p">:</span> <span class="w"> </span><span class="nx">interpretCallExpr</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">e</span><span class="p">)</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="o">*</span><span class="nx">ast</span><span class="p">.</span><span class="nx">Ident</span><span class="p">:</span> <span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">setValue</span><span class="p">(</span><span class="nx">ctx</span><span class="p">[</span><span class="nx">e</span><span class="p">.</span><span class="nx">Name</span><span class="p">])</span> <span class="w"> </span><span class="k">default</span><span class="p">:</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatalf</span><span class="p">(</span><span class="s">&quot;Unknown expr type (%s): %+v&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">reflect</span><span class="p">.</span><span class="nx">TypeOf</span><span class="p">(</span><span class="nx">e</span><span class="p">),</span><span class="w"> </span><span class="nx">e</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>This is also a good time to add the <code>println</code> builtin to our top-level context.</p> <div class="highlight"><pre><span></span><span class="k">func</span><span class="w"> </span><span class="n">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="o">...</span> <span class="w"> </span><span class="n">ctx</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="n">context</span><span class="p">{}</span> <span class="w"> </span><span class="n">interpret</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span><span class="w"> </span><span class="n">f</span><span class="p">)</span> <span class="w"> </span><span class="n">ctx</span><span class="p">[</span><span class="s2">&quot;println&quot;</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">value</span><span class="p">{</span> <span class="w"> </span><span class="n">fn</span><span class="p">,</span> <span class="w"> </span><span class="k">func</span><span class="p">(</span><span class="n">ctx</span><span class="w"> </span><span class="n">context</span><span class="p">,</span><span class="w"> </span><span class="n">r</span><span class="w"> </span><span class="o">*</span><span class="n">ret</span><span class="p">,</span><span class="w"> </span><span class="n">args</span><span class="w"> </span><span class="p">[]</span><span class="n">value</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">var</span><span class="w"> </span><span class="n">values</span><span class="w"> </span><span class="p">[]</span><span class="n">interface</span><span class="p">{}</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="n">_</span><span class="p">,</span><span class="w"> </span><span class="n">arg</span><span class="w"> </span><span class="p">:</span><span class="o">=</span><span class="w"> </span><span class="nb">range</span><span class="w"> </span><span class="n">args</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">values</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">append</span><span class="p">(</span><span class="n">values</span><span class="p">,</span><span class="w"> </span><span class="n">arg</span><span class="o">.</span><span class="n">value</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">fmt</span><span class="o">.</span><span class="n">Println</span><span class="p">(</span><span class="n">values</span><span class="o">...</span><span class="p">)</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">var</span><span class="w"> </span><span class="n">r</span><span class="w"> </span><span class="n">ret</span> <span class="w"> </span><span class="n">ctx</span><span class="p">[</span><span class="s2">&quot;main&quot;</span><span class="p">]</span><span class="o">.</span><span class="n">value</span><span class="o">.</span><span class="p">(</span><span class="k">func</span><span class="p">(</span><span class="n">context</span><span class="p">,</span><span class="w"> </span><span class="o">*</span><span class="n">ret</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="n">value</span><span class="p">))(</span><span class="n">ctx</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="n">r</span><span class="p">,</span><span class="w"> </span><span class="p">[]</span><span class="n">value</span><span class="p">{})</span> <span class="p">}</span> </pre></div> <h3 id="more-ast.expr-implementations">More ast.Expr implementations</h3><p>Running the interpreter again we get:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>build<span class="w"> </span>goi.go $<span class="w"> </span>./goi<span class="w"> </span>fib.go <span class="m">2019</span>/10/12<span class="w"> </span><span class="m">10</span>:41:59<span class="w"> </span>Unknown<span class="w"> </span>expr<span class="w"> </span><span class="nb">type</span><span class="w"> </span><span class="o">(</span>*ast.BasicLit<span class="o">)</span>:<span class="w"> </span><span class="p">&amp;</span><span class="o">{</span>ValuePos:151<span class="w"> </span>Kind:INT<span class="w"> </span>Value:15<span class="o">}</span> </pre></div> <p>Easy enough: we'll switch on the "kind" and parse a string int to an int and wrap it in our value type.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">interpretExpr</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">,</span><span class="w"> </span><span class="nx">expr</span><span class="w"> </span><span class="nx">ast</span><span class="p">.</span><span class="nx">Expr</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">e</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">expr</span><span class="p">.(</span><span class="kd">type</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="o">*</span><span class="nx">ast</span><span class="p">.</span><span class="nx">CallExpr</span><span class="p">:</span> <span class="w"> </span><span class="nx">interpretCallExpr</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">e</span><span class="p">)</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="o">*</span><span class="nx">ast</span><span class="p">.</span><span class="nx">Ident</span><span class="p">:</span> <span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">setValue</span><span class="p">(</span><span class="nx">ctx</span><span class="p">[</span><span class="nx">e</span><span class="p">.</span><span class="nx">Name</span><span class="p">])</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="o">*</span><span class="nx">ast</span><span class="p">.</span><span class="nx">BasicLit</span><span class="p">:</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">e</span><span class="p">.</span><span class="nx">Kind</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">token</span><span class="p">.</span><span class="nx">INT</span><span class="p">:</span> <span class="w"> </span><span class="nx">i</span><span class="p">,</span><span class="w"> </span><span class="nx">_</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">strconv</span><span class="p">.</span><span class="nx">ParseInt</span><span class="p">(</span><span class="nx">e</span><span class="p">.</span><span class="nx">Value</span><span class="p">,</span><span class="w"> </span><span class="mi">10</span><span class="p">,</span><span class="w"> </span><span class="mi">64</span><span class="p">)</span> <span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">setValue</span><span class="p">(</span><span class="nx">value</span><span class="p">{</span><span class="nx">i64</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">})</span> <span class="w"> </span><span class="k">default</span><span class="p">:</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatalf</span><span class="p">(</span><span class="s">&quot;Unknown basiclit type: %+v&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">e</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">default</span><span class="p">:</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatalf</span><span class="p">(</span><span class="s">&quot;Unknown expr type (%s): %+v&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">reflect</span><span class="p">.</span><span class="nx">TypeOf</span><span class="p">(</span><span class="nx">e</span><span class="p">),</span><span class="w"> </span><span class="nx">e</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>Now we run again:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>build<span class="w"> </span>goi.go $<span class="w"> </span>./goi<span class="w"> </span>fib.go <span class="m">2019</span>/10/12<span class="w"> </span><span class="m">10</span>:48:46<span class="w"> </span>Unknown<span class="w"> </span>stmt<span class="w"> </span><span class="nb">type</span><span class="w"> </span><span class="o">(</span>*ast.IfStmt<span class="o">)</span>:<span class="w"> </span><span class="p">&amp;</span><span class="o">{</span>If:38<span class="w"> </span>Init:&lt;nil&gt;<span class="w"> </span>Cond:0xc0000ac150<span class="w"> </span>Body:0xc0000ac1b0<span class="w"> </span>Else:&lt;nil&gt;<span class="o">}</span> </pre></div> <p>Cool, more control flow!</p> <h3 id="interpreting-ast.ifstmt">Interpreting ast.IfStmt</h3><p>For <a href="https://golang.org/pkg/go/ast/#IfStmt">ast.IfStmt</a> we interpret the condition and, depending on the condition, interpret the body or the else node. In order to make empty else interpreting easier, we'll also add a nil short-circuit to <code>interpretStmt</code>.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">interpretIfStmt</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">,</span><span class="w"> </span><span class="nx">is</span><span class="w"> </span><span class="o">*</span><span class="nx">ast</span><span class="p">.</span><span class="nx">IfStmt</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">interpretStmt</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="kc">nil</span><span class="p">,</span><span class="w"> </span><span class="nx">is</span><span class="p">.</span><span class="nx">Init</span><span class="p">)</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">cr</span><span class="w"> </span><span class="nx">ret</span> <span class="w"> </span><span class="nx">interpretExpr</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">cr</span><span class="p">,</span><span class="w"> </span><span class="nx">is</span><span class="p">.</span><span class="nx">Cond</span><span class="p">)</span> <span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">cr</span><span class="p">.</span><span class="nx">valus</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">value</span><span class="p">.(</span><span class="kt">bool</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">interpretBlockStmt</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">is</span><span class="p">.</span><span class="nx">Body</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">interpretStmt</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">is</span><span class="p">.</span><span class="nx">Else</span><span class="p">)</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">interpretStmt</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">,</span><span class="w"> </span><span class="nx">stmt</span><span class="w"> </span><span class="nx">ast</span><span class="p">.</span><span class="nx">Stmt</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">stmt</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">stmt</span><span class="p">.(</span><span class="kd">type</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="o">*</span><span class="nx">ast</span><span class="p">.</span><span class="nx">IfStmt</span><span class="p">:</span> <span class="w"> </span><span class="nx">interpretIfStmt</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">)</span> <span class="w"> </span><span class="o">...</span> </pre></div> <p>Let's try it out:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>build<span class="w"> </span>goi.go $<span class="w"> </span>./goi<span class="w"> </span>fib.go <span class="m">2019</span>/10/12<span class="w"> </span><span class="m">10</span>:56:28<span class="w"> </span>Unknown<span class="w"> </span>expr<span class="w"> </span><span class="nb">type</span><span class="w"> </span><span class="o">(</span>*ast.BinaryExpr<span class="o">)</span>:<span class="w"> </span><span class="p">&amp;</span><span class="o">{</span>X:a<span class="w"> </span>OpPos:43<span class="w"> </span>Op:<span class="o">==</span><span class="w"> </span>Y:0xc00008a120<span class="o">}</span> </pre></div> <p>Great!</p> <h3 id="interpreting-ast.binaryexpr">Interpreting ast.BinaryExpr</h3><p>An <a href="https://golang.org/pkg/go/ast/#BinaryExpr">ast.BinaryExpr</a> has an <code>Op</code> field that we'll switch on to decide what operations to do. We'll interpret the left side and then the right side and finally perform the operation and return the result. The three binary operations we use in this program are <code>==</code>, <code>+</code> and <code>-</code>. We'll look these up in <a href="https://golang.org/pkg/go/token/#Token">go/token docs</a> to discover the associated constants.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">interpretBinaryExpr</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">,</span><span class="w"> </span><span class="nx">bexpr</span><span class="w"> </span><span class="o">*</span><span class="nx">ast</span><span class="p">.</span><span class="nx">BinaryExpr</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">xr</span><span class="p">,</span><span class="w"> </span><span class="nx">yr</span><span class="w"> </span><span class="nx">ret</span> <span class="w"> </span><span class="nx">interpretExpr</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">xr</span><span class="p">,</span><span class="w"> </span><span class="nx">bexpr</span><span class="p">.</span><span class="nx">X</span><span class="p">)</span> <span class="w"> </span><span class="nx">x</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">xr</span><span class="p">.</span><span class="nx">values</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="w"> </span><span class="nx">interpretExpr</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">yr</span><span class="p">,</span><span class="w"> </span><span class="nx">bexpr</span><span class="p">.</span><span class="nx">Y</span><span class="p">)</span> <span class="w"> </span><span class="nx">y</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">yr</span><span class="p">.</span><span class="nx">values</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">bexpr</span><span class="p">.</span><span class="nx">Op</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">token</span><span class="p">.</span><span class="nx">ADD</span><span class="p">:</span> <span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">setValue</span><span class="p">(</span><span class="nx">value</span><span class="p">{</span><span class="nx">i64</span><span class="p">,</span><span class="w"> </span><span class="nx">x</span><span class="p">.</span><span class="nx">value</span><span class="p">.(</span><span class="kt">int64</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">y</span><span class="p">.</span><span class="nx">value</span><span class="p">.(</span><span class="kt">int64</span><span class="p">)})</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">token</span><span class="p">.</span><span class="nx">SUB</span><span class="p">:</span> <span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">setValue</span><span class="p">(</span><span class="nx">value</span><span class="p">{</span><span class="nx">i64</span><span class="p">,</span><span class="w"> </span><span class="nx">x</span><span class="p">.</span><span class="nx">value</span><span class="p">.(</span><span class="kt">int64</span><span class="p">)</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nx">y</span><span class="p">.</span><span class="nx">value</span><span class="p">.(</span><span class="kt">int64</span><span class="p">)})</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">token</span><span class="p">.</span><span class="nx">EQL</span><span class="p">:</span> <span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">setValue</span><span class="p">(</span><span class="nx">value</span><span class="p">{</span><span class="nx">bl</span><span class="p">,</span><span class="w"> </span><span class="nx">x</span><span class="p">.</span><span class="nx">value</span><span class="p">.(</span><span class="kt">int64</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="nx">y</span><span class="p">.</span><span class="nx">value</span><span class="p">.(</span><span class="kt">int64</span><span class="p">)})</span> <span class="w"> </span><span class="k">default</span><span class="p">:</span> <span class="w"> </span><span class="nx">log</span><span class="p">.</span><span class="nx">Fatalf</span><span class="p">(</span><span class="s">&quot;Unknown binary expression type: %+v&quot;</span><span class="p">,</span><span class="w"> </span><span class="nx">bexpr</span><span class="p">)</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">interpretExpr</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">,</span><span class="w"> </span><span class="nx">expr</span><span class="w"> </span><span class="nx">ast</span><span class="p">.</span><span class="nx">Expr</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">e</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">expr</span><span class="p">.(</span><span class="kd">type</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="o">*</span><span class="nx">ast</span><span class="p">.</span><span class="nx">BinaryExpr</span><span class="p">:</span> <span class="w"> </span><span class="nx">interpretBinaryExpr</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">e</span><span class="p">)</span> <span class="w"> </span><span class="o">...</span> </pre></div> <p>Let's try one more time!</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>build<span class="w"> </span>goi.go $<span class="w"> </span>./goi<span class="w"> </span>fib.go <span class="m">2019</span>/10/12<span class="w"> </span><span class="m">11</span>:06:19<span class="w"> </span>Unknown<span class="w"> </span>stmt<span class="w"> </span><span class="nb">type</span><span class="w"> </span><span class="o">(</span>*ast.ReturnStmt<span class="o">)</span>:<span class="w"> </span><span class="p">&amp;</span><span class="o">{</span>Return:94<span class="w"> </span>Results:<span class="o">[</span>0xc000070540<span class="o">]}</span> </pre></div> <p>Awesome, last step.</p> <h3 id="interpreting-ast.returnstmt">Interpreting ast.ReturnStmt</h3><p>Based on the <a href="https://golang.org/pkg/go/ast/#ReturnStmt">ast.ReturnStmt</a> definition we'll have to interpret each expression and set all of them to the <code>ret</code> value.</p> <div class="highlight"><pre><span></span><span class="kd">func</span><span class="w"> </span><span class="nx">interpretReturnStmt</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="o">*</span><span class="nx">ast</span><span class="p">.</span><span class="nx">ReturnStmt</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">values</span><span class="w"> </span><span class="p">[]</span><span class="nx">value</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">expr</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="k">range</span><span class="w"> </span><span class="nx">s</span><span class="p">.</span><span class="nx">Results</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">var</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="nx">ret</span> <span class="w"> </span><span class="nx">interpretExpr</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="o">&amp;</span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">expr</span><span class="p">)</span> <span class="w"> </span><span class="nx">values</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="nb">append</span><span class="p">(</span><span class="nx">values</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">values</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">r</span><span class="p">.</span><span class="nx">setValues</span><span class="p">(</span><span class="nx">values</span><span class="p">)</span> <span class="w"> </span><span class="k">return</span> <span class="p">}</span> <span class="kd">func</span><span class="w"> </span><span class="nx">interpretStmt</span><span class="p">(</span><span class="nx">ctx</span><span class="w"> </span><span class="nx">context</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">*</span><span class="nx">ret</span><span class="p">,</span><span class="w"> </span><span class="nx">stmt</span><span class="w"> </span><span class="nx">ast</span><span class="p">.</span><span class="nx">Stmt</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nx">stmt</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="kc">nil</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="nx">s</span><span class="w"> </span><span class="o">:=</span><span class="w"> </span><span class="nx">stmt</span><span class="p">.(</span><span class="kd">type</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="o">*</span><span class="nx">ast</span><span class="p">.</span><span class="nx">ReturnStmt</span><span class="p">:</span> <span class="w"> </span><span class="nx">interpretReturnStmt</span><span class="p">(</span><span class="nx">ctx</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="p">,</span><span class="w"> </span><span class="nx">s</span><span class="p">)</span> <span class="w"> </span><span class="o">...</span> </pre></div> <p>And let's try one last time:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>go<span class="w"> </span>build<span class="w"> </span>goi.go $<span class="w"> </span>./goi<span class="w"> </span>fib.go <span class="m">377</span> </pre></div> <p>Looking good. :) Let's try with another input:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>fib.go package<span class="w"> </span>main func<span class="w"> </span>fib<span class="o">(</span>a<span class="w"> </span>int<span class="o">)</span><span class="w"> </span>int<span class="w"> </span><span class="o">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nv">a</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="o">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="m">0</span> <span class="w"> </span><span class="o">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="nv">a</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="m">2</span><span class="w"> </span><span class="o">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="m">1</span> <span class="w"> </span><span class="o">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span>fib<span class="o">(</span>a-1<span class="o">)</span><span class="w"> </span>+<span class="w"> </span>fib<span class="o">(</span>a-2<span class="o">)</span> <span class="o">}</span> func<span class="w"> </span>main<span class="o">()</span><span class="w"> </span><span class="o">{</span> <span class="w"> </span>println<span class="o">(</span>fib<span class="o">(</span><span class="m">14</span><span class="o">))</span> <span class="o">}</span> $<span class="w"> </span>./goi<span class="w"> </span>fib.go <span class="m">233</span> </pre></div> <p>We've got the basics of an interpreter for Golang.</p> <p><blockquote class="twitter-tweet" data-conversation="none"><p lang="en" dir="ltr">Here&#39;s a blog post on building a simple AST interpreter for Go to support running a recursive fibonacci implementation <a href="https://t.co/5Zz388d8ZN">https://t.co/5Zz388d8ZN</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1183039387170430976?ref_src=twsrc%5Etfw">October 12, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/interpreting-go.htmlSat, 12 Oct 2019 00:00:00 +0000Administering Kubernetes is hardhttp://notes.eatonphil.com/administering-kubernetes-is-hard.html<p>Kubernetes is easy to use after some exposure; it's pretty convenient too. But it is super hard to set up.</p> <p><a href="https://eksctl.io">eksctl</a> is a good tool for folks who don't want to spend hours/days/weeks debugging VPC configuration in 1000s of lines of CloudFormation. None of the other tools seem to be that much easier to use (kops, kubeadm, etc.).</p> <p>But even with EKS and eksctl you are constrained to Amazon Linux worker nodes. AMIs are practically impossible to discover.</p> <p>I haven't spent much time with GKE.</p> <p>And while eksctl operates on the right level for developers needing to administrate small/medium-sized systems, it... doesn't exist outside EKS.</p> <p>It is unfortunate the only major container orchestration system is this complex to administer. The user-facing APIs are pretty solid and guide toward sustainable system design. It is <em>really</em> hard to see the value for most companies with medium-sized deployments tasked with administration. Among serious proprietary alternatives, sure, there's ECS and Google App Engine. But there's little advantage in existing Kubernetes user knowledge. The OSS alternatives don't have the adoption to seem like a good investment.</p> <p>OpenStack's <a href="https://wiki.openstack.org/wiki/Magnum">magnum</a> or OpenShift seem like possible high-level providers for a generic environment. But neither are particularly known for stability.</p> <p>In all, the ecosystem has gotten friendlier. There will probably be a time in the future (3-5 years from now?) when Kubernetes is fairly easy to administer.</p> <p>I'd love to hear your thoughts and experiences administering Kubernetes.</p> http://notes.eatonphil.com/administering-kubernetes-is-hard.htmlMon, 30 Sep 2019 00:00:00 +0000Unit testing C code with gtesthttp://notes.eatonphil.com/unit-testing-c-code-with-gtest.html<p>This post covers building and testing a minimal, but still useful, C project. We'll use <a href="https://github.com/google/googletest">Google's gtest</a> and <a href="https://cmake.org">CMake</a> for testing C code. This will serve as a foundation for some upcoming posts/projects on programming Linux, userland networking and interpreters.</p> <p class="note"> The first version of this post only included one module to test. The <code>test/CMakeLists.txt</code> would also only expose a single pass-fail status for all modules. The second version of this post extends the <code>test/CMakeLists.txt</code> to expose each <code>test/*.cpp</code> file as its own CMake test so that results are displayed by <code>ctest</code> per file. The second version also splits the original <code>src/testy.c</code> and <code>include/testy/testy.h</code> module into a <code>widget</code> and <code>customer</code> module to demonstrate the changes to the CMake configuration. </p><h3 id="the-"testy"-sample-project">The "testy" sample project</h3><p>In this project, we'll put source code in <code>src/</code> and publicly exported symbols (functions, structs, etc.) in header files in <code>include/testy/</code>. There will be a <code>main.c</code> in the <code>src/</code> directory. Tests are written in C++ (since gtest is a C++ testing framework) and are in the <code>test/</code> directory.</p> <p>Here's an overview of the source and test code.</p> <h4 id="src/widget.c">src/widget.c</h4><p>This file has some library code that we should be able to test.</p> <div class="highlight"><pre><span></span><span class="cp">#include</span><span class="w"> </span><span class="cpf">&quot;testy/widget.h&quot;</span> <span class="kt">int</span><span class="w"> </span><span class="n">private_ok_value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">2</span><span class="p">;</span> <span class="kt">int</span><span class="w"> </span><span class="nf">widget_ok</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">a</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="w"> </span><span class="n">b</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">private_ok_value</span><span class="p">;</span> <span class="p">}</span> </pre></div> <h4 id="include/testy/widget.h">include/testy/widget.h</h4><p>This file handles exported symbols for widget code.</p> <div class="highlight"><pre><span></span><span class="cp">#ifndef _WIDGET_H_</span> <span class="cp">#define _WIDGET_H_</span> <span class="kt">int</span><span class="w"> </span><span class="nf">widget_ok</span><span class="p">(</span><span class="kt">int</span><span class="p">,</span><span class="w"> </span><span class="kt">int</span><span class="p">);</span> <span class="cp">#endif</span> </pre></div> <h4 id="src/customer.c">src/customer.c</h4><p>This file has some more library code that we should be able to test.</p> <div class="highlight"><pre><span></span><span class="cp">#include</span><span class="w"> </span><span class="cpf">&quot;testy/customer.h&quot;</span> <span class="kt">int</span><span class="w"> </span><span class="nf">customer_check</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">a</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">a</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">5</span><span class="p">;</span> <span class="p">}</span> </pre></div> <h4 id="include/testy/customer.h">include/testy/customer.h</h4><p>This file handles exported symbols for customer code.</p> <div class="highlight"><pre><span></span><span class="cp">#ifndef _CUSTOMER_H_</span> <span class="cp">#define _CUSTOMER_H_</span> <span class="kt">int</span><span class="w"> </span><span class="nf">customer_check</span><span class="p">(</span><span class="kt">int</span><span class="p">);</span> <span class="cp">#endif</span> </pre></div> <h4 id="src/main.c">src/main.c</h4><p>This is the entrypoint to a program built around libtesty.</p> <div class="highlight"><pre><span></span><span class="cp">#include</span><span class="w"> </span><span class="cpf">&quot;testy/customer.h&quot;</span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&quot;testy/widget.h&quot;</span> <span class="kt">int</span><span class="w"> </span><span class="nf">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">widget_ok</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">customer_check</span><span class="p">(</span><span class="mi">5</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span> <span class="p">}</span> </pre></div> <h4 id="test/widget.cpp">test/widget.cpp</h4><p>This is one of our test files. It registers test cases and uses gtest to make assertions. We need to wrap the <code>testy/widget.h</code> include in an <code>extern "C"</code> to stop C++ from <a href="https://www.geeksforgeeks.org/extern-c-in-c/">name-mangling</a>.</p> <div class="highlight"><pre><span></span><span class="cp">#include</span><span class="w"> </span><span class="cpf">&quot;gtest/gtest.h&quot;</span> <span class="k">extern</span><span class="w"> </span><span class="s">&quot;C&quot;</span><span class="w"> </span><span class="p">{</span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&quot;testy/widget.h&quot;</span> <span class="p">}</span> <span class="n">TEST</span><span class="p">(</span><span class="n">widget</span><span class="p">,</span><span class="w"> </span><span class="n">ok</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">ASSERT_EQ</span><span class="p">(</span><span class="n">widget_ok</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">),</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span> <span class="p">}</span> <span class="n">TEST</span><span class="p">(</span><span class="n">testy</span><span class="p">,</span><span class="w"> </span><span class="n">not_ok</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">ASSERT_EQ</span><span class="p">(</span><span class="n">widget_ok</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span><span class="p">),</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span> <span class="p">}</span> </pre></div> <p>You can see a good high-level overview of gtest testing utilities like <code>ASSERT_EQ</code> and <code>TEST</code> <a href="https://github.com/google/googletest/blob/master/googletest/docs/primer.md">here</a>.</p> <h4 id="test/customer.cpp">test/customer.cpp</h4><p>This is another one of our test files.</p> <div class="highlight"><pre><span></span><span class="cp">#include</span><span class="w"> </span><span class="cpf">&quot;gtest/gtest.h&quot;</span> <span class="k">extern</span><span class="w"> </span><span class="s">&quot;C&quot;</span><span class="w"> </span><span class="p">{</span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&quot;testy/customer.h&quot;</span> <span class="p">}</span> <span class="n">TEST</span><span class="p">(</span><span class="n">customer</span><span class="p">,</span><span class="w"> </span><span class="n">ok</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">ASSERT_EQ</span><span class="p">(</span><span class="n">customer_check</span><span class="p">(</span><span class="mi">5</span><span class="p">),</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span> <span class="p">}</span> <span class="n">TEST</span><span class="p">(</span><span class="n">testy</span><span class="p">,</span><span class="w"> </span><span class="n">not_ok</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">ASSERT_EQ</span><span class="p">(</span><span class="n">customer_check</span><span class="p">(</span><span class="mi">0</span><span class="p">),</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span><span class="w"> </span> <span class="p">}</span> </pre></div> <h4 id="test/main.cpp">test/main.cpp</h4><p>This is a standard entrypoint for the test runner.</p> <div class="highlight"><pre><span></span><span class="cp">#include</span><span class="w"> </span><span class="cpf">&quot;gtest/gtest.h&quot;</span> <span class="kt">int</span><span class="w"> </span><span class="nf">main</span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">argc</span><span class="p">,</span><span class="w"> </span><span class="kt">char</span><span class="w"> </span><span class="o">**</span><span class="n">argv</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="o">::</span><span class="n">testing</span><span class="o">::</span><span class="n">InitGoogleTest</span><span class="p">(</span><span class="o">&amp;</span><span class="n">argc</span><span class="p">,</span><span class="w"> </span><span class="n">argv</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">RUN_ALL_TESTS</span><span class="p">();</span> <span class="p">}</span> </pre></div> <h3 id="building-with-cmake">Building with CMake</h3><p><a href="https://cmake.org">CMake</a> is a build tool that (among other things) produces a Makefile we can run to build our code. We will also use it for dependency management. But fundementally we use it because gtest requires it.</p> <p>CMake options/rules are defined in a CMakeLists.txt file. We'll have one in the root directory, one in the test directory, and a template for one that will handle the gtest dependency.</p> <p>A first draft of the top-level CMakeLists.txt might look like this:</p> <div class="highlight"><pre><span></span><span class="nb">cmake_minimum_required</span><span class="p">(</span><span class="s">VERSION</span><span class="w"> </span><span class="s">3.1</span><span class="p">)</span> <span class="nb">project</span><span class="p">(</span><span class="s">testy</span><span class="p">)</span> <span class="c">##</span> <span class="c">### Source definitions ###</span> <span class="c">##</span> <span class="nb">include_directories</span><span class="p">(</span><span class="s2">&quot;${PROJECT_SOURCE_DIR}/include&quot;</span><span class="p">)</span> <span class="nb">file</span><span class="p">(</span><span class="s">GLOB</span><span class="w"> </span><span class="s">sources</span><span class="w"> </span><span class="s2">&quot;${PROJECT_SOURCE_DIR}/src/*.c&quot;</span><span class="p">)</span> <span class="nb">add_executable</span><span class="p">(</span><span class="s">testy</span><span class="w"> </span><span class="o">${</span><span class="nv">sources</span><span class="o">}</span><span class="p">)</span> </pre></div> <p>Using <code>include_directory</code> will make sure we compile with the <code>-I</code> flag set up correctly for our include directory.</p> <p>Using <code>add_executable</code> sets up the binary name to produce from the given sources. And we're using the <code>file</code> helper to get a glob match of C files rather than listing them all out verbatim in the <code>add_executable</code> call.</p> <h4 id="building-and-running">Building and running</h4><p>CMake pollutes the current directory, and is fine running in a different directory, so we'll make a <code>build/</code> directory so we don't pollute root. Then we'll build a Makefile with CMake, run Make, and run our program.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>mkdir<span class="w"> </span>build $<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>build $<span class="w"> </span>cmake<span class="w"> </span>.. --<span class="w"> </span>The<span class="w"> </span>C<span class="w"> </span>compiler<span class="w"> </span>identification<span class="w"> </span>is<span class="w"> </span>AppleClang<span class="w"> </span><span class="m">10</span>.0.1.10010046 --<span class="w"> </span>The<span class="w"> </span>CXX<span class="w"> </span>compiler<span class="w"> </span>identification<span class="w"> </span>is<span class="w"> </span>AppleClang<span class="w"> </span><span class="m">10</span>.0.1.10010046 --<span class="w"> </span>Check<span class="w"> </span><span class="k">for</span><span class="w"> </span>working<span class="w"> </span>C<span class="w"> </span>compiler:<span class="w"> </span>/Library/Developer/CommandLineTools/usr/bin/cc --<span class="w"> </span>Check<span class="w"> </span><span class="k">for</span><span class="w"> </span>working<span class="w"> </span>C<span class="w"> </span>compiler:<span class="w"> </span>/Library/Developer/CommandLineTools/usr/bin/cc<span class="w"> </span>--<span class="w"> </span>works --<span class="w"> </span>Detecting<span class="w"> </span>C<span class="w"> </span>compiler<span class="w"> </span>ABI<span class="w"> </span>info --<span class="w"> </span>Detecting<span class="w"> </span>C<span class="w"> </span>compiler<span class="w"> </span>ABI<span class="w"> </span>info<span class="w"> </span>-<span class="w"> </span><span class="k">done</span> --<span class="w"> </span>Detecting<span class="w"> </span>C<span class="w"> </span>compile<span class="w"> </span>features --<span class="w"> </span>Detecting<span class="w"> </span>C<span class="w"> </span>compile<span class="w"> </span>features<span class="w"> </span>-<span class="w"> </span><span class="k">done</span> --<span class="w"> </span>Check<span class="w"> </span><span class="k">for</span><span class="w"> </span>working<span class="w"> </span>CXX<span class="w"> </span>compiler:<span class="w"> </span>/Library/Developer/CommandLineTools/usr/bin/c++ --<span class="w"> </span>Check<span class="w"> </span><span class="k">for</span><span class="w"> </span>working<span class="w"> </span>CXX<span class="w"> </span>compiler:<span class="w"> </span>/Library/Developer/CommandLineTools/usr/bin/c++<span class="w"> </span>--<span class="w"> </span>works --<span class="w"> </span>Detecting<span class="w"> </span>CXX<span class="w"> </span>compiler<span class="w"> </span>ABI<span class="w"> </span>info --<span class="w"> </span>Detecting<span class="w"> </span>CXX<span class="w"> </span>compiler<span class="w"> </span>ABI<span class="w"> </span>info<span class="w"> </span>-<span class="w"> </span><span class="k">done</span> --<span class="w"> </span>Detecting<span class="w"> </span>CXX<span class="w"> </span>compile<span class="w"> </span>features --<span class="w"> </span>Detecting<span class="w"> </span>CXX<span class="w"> </span>compile<span class="w"> </span>features<span class="w"> </span>-<span class="w"> </span><span class="k">done</span> --<span class="w"> </span>Configuring<span class="w"> </span><span class="k">done</span> --<span class="w"> </span>Generating<span class="w"> </span><span class="k">done</span> --<span class="w"> </span>Build<span class="w"> </span>files<span class="w"> </span>have<span class="w"> </span>been<span class="w"> </span>written<span class="w"> </span>to:<span class="w"> </span>/Users/philipeaton/tmp/testy/build $<span class="w"> </span>make <span class="o">[</span><span class="w"> </span><span class="m">25</span>%<span class="o">]</span><span class="w"> </span>Building<span class="w"> </span>C<span class="w"> </span>object<span class="w"> </span>CMakeFiles/testy.dir/src/customer.c.o <span class="o">[</span><span class="w"> </span><span class="m">50</span>%<span class="o">]</span><span class="w"> </span>Building<span class="w"> </span>C<span class="w"> </span>object<span class="w"> </span>CMakeFiles/testy.dir/src/widget.c.o <span class="o">[</span><span class="w"> </span><span class="m">75</span>%<span class="o">]</span><span class="w"> </span>Building<span class="w"> </span>C<span class="w"> </span>object<span class="w"> </span>CMakeFiles/testy.dir/src/main.c.o <span class="o">[</span><span class="m">100</span>%<span class="o">]</span><span class="w"> </span>Linking<span class="w"> </span>C<span class="w"> </span>executable<span class="w"> </span>testy <span class="o">[</span><span class="m">100</span>%<span class="o">]</span><span class="w"> </span>Built<span class="w"> </span>target<span class="w"> </span>testy $<span class="w"> </span>./testy $<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="nv">$?</span> <span class="m">1</span> </pre></div> <h3 id="cmakelists.txt.in">CMakeLists.txt.in</h3><p>This template file handles downloading the gtest dependency from github.com pinned to a release. It will be copied into a subdirectory during the <code>cmake ..</code> step.</p> <div class="highlight"><pre><span></span><span class="nb">cmake_minimum_required</span><span class="p">(</span><span class="s">VERSION</span><span class="w"> </span><span class="s">3.1</span><span class="p">)</span> <span class="nb">project</span><span class="p">(</span><span class="s">googletest-download</span><span class="w"> </span><span class="s">NONE</span><span class="p">)</span> <span class="nb">include</span><span class="p">(</span><span class="s">ExternalProject</span><span class="p">)</span> <span class="nb">ExternalProject_Add</span><span class="p">(</span><span class="s">googletest</span> <span class="w"> </span><span class="s">GIT_REPOSITORY</span><span class="w"> </span><span class="s">https://github.com/google/googletest.git</span> <span class="w"> </span><span class="s">GIT_TAG</span><span class="w"> </span><span class="s">release-1.8.1</span> <span class="w"> </span><span class="s">SOURCE_DIR</span><span class="w"> </span><span class="s2">&quot;${CMAKE_BINARY_DIR}/googletest-src&quot;</span> <span class="w"> </span><span class="s">BINARY_DIR</span><span class="w"> </span><span class="s2">&quot;${CMAKE_BINARY_DIR}/googletest-build&quot;</span> <span class="w"> </span><span class="s">CONFIGURE_COMMAND</span><span class="w"> </span><span class="s2">&quot;&quot;</span> <span class="w"> </span><span class="s">BUILD_COMMAND</span><span class="w"> </span><span class="s2">&quot;&quot;</span> <span class="w"> </span><span class="s">INSTALL_COMMAND</span><span class="w"> </span><span class="s2">&quot;&quot;</span> <span class="w"> </span><span class="s">TEST_COMMAND</span><span class="w"> </span><span class="s2">&quot;&quot;</span> <span class="p">)</span> </pre></div> <p>Now we can tell CMake about it and how to build, within the top-level CMakeLists.txt file.</p> <div class="highlight"><pre><span></span><span class="nb">cmake_minimum_required</span><span class="p">(</span><span class="s">VERSION</span><span class="w"> </span><span class="s">3.1</span><span class="p">)</span> <span class="nb">project</span><span class="p">(</span><span class="s">testy</span><span class="p">)</span> <span class="c">##</span> <span class="c">### Test definitions ###</span> <span class="c">##</span> <span class="nb">configure_file</span><span class="p">(</span><span class="s">CMakeLists.txt.in</span> <span class="w"> </span><span class="s">googletest-download/CMakeLists.txt</span><span class="p">)</span> <span class="nb">execute_process</span><span class="p">(</span><span class="s">COMMAND</span><span class="w"> </span><span class="o">${</span><span class="nv">CMAKE_COMMAND</span><span class="o">}</span><span class="w"> </span><span class="s">-G</span><span class="w"> </span><span class="s2">&quot;${CMAKE_GENERATOR}&quot;</span><span class="w"> </span><span class="s">.</span> <span class="w"> </span><span class="s">WORKING_DIRECTORY</span><span class="w"> </span><span class="o">${</span><span class="nv">CMAKE_BINARY_DIR</span><span class="o">}</span><span class="s">/googletest-download</span><span class="w"> </span><span class="p">)</span> <span class="nb">execute_process</span><span class="p">(</span><span class="s">COMMAND</span><span class="w"> </span><span class="o">${</span><span class="nv">CMAKE_COMMAND</span><span class="o">}</span><span class="w"> </span><span class="s">--build</span><span class="w"> </span><span class="s">.</span> <span class="w"> </span><span class="s">WORKING_DIRECTORY</span><span class="w"> </span><span class="o">${</span><span class="nv">CMAKE_BINARY_DIR</span><span class="o">}</span><span class="s">/googletest-download</span><span class="w"> </span><span class="p">)</span> <span class="nb">add_subdirectory</span><span class="p">(</span><span class="o">${</span><span class="nv">CMAKE_BINARY_DIR</span><span class="o">}</span><span class="s">/googletest-src</span> <span class="w"> </span><span class="o">${</span><span class="nv">CMAKE_BINARY_DIR</span><span class="o">}</span><span class="s">/googletest-build</span><span class="p">)</span> <span class="nb">enable_testing</span><span class="p">()</span> <span class="nb">add_subdirectory</span><span class="p">(</span><span class="s">test</span><span class="p">)</span> <span class="c">##</span> <span class="c">### Source definitions ###</span> <span class="c">##</span> <span class="nb">include_directories</span><span class="p">(</span><span class="s2">&quot;${PROJECT_SOURCE_DIR}/include&quot;</span><span class="p">)</span> <span class="nb">file</span><span class="p">(</span><span class="s">GLOB</span><span class="w"> </span><span class="s">sources</span> <span class="w"> </span><span class="s2">&quot;${PROJECT_SOURCE_DIR}/include/testy/*.h&quot;</span> <span class="w"> </span><span class="s2">&quot;${PROJECT_SOURCE_DIR}/src/*.c&quot;</span><span class="p">)</span> <span class="nb">add_executable</span><span class="p">(</span><span class="s">testy</span><span class="w"> </span><span class="o">${</span><span class="nv">sources</span><span class="o">}</span><span class="p">)</span> </pre></div> <p>The <code>add_subdirectory</code> calls register a directory that contains its own CMakeLists.txt. It would fail now without a <code>CMakeLists.txt</code> file in the <code>test/</code> directory.</p> <h3 id="test/cmakelists.txt">test/CMakeLists.txt</h3><p>This final file registers a <code>unit_test</code> executable compiling against the source and test code, and includes the project header files.</p> <div class="highlight"><pre><span></span><span class="nb">include_directories</span><span class="p">(</span><span class="s2">&quot;${PROJECT_SOURCE_DIR}/include&quot;</span><span class="p">)</span> <span class="nb">file</span><span class="p">(</span><span class="s">GLOB</span><span class="w"> </span><span class="s">sources</span><span class="w"> </span><span class="s2">&quot;${PROJECT_SOURCE_DIR}/src/*.c&quot;</span><span class="p">)</span> <span class="nb">list</span><span class="p">(</span><span class="s">REMOVE_ITEM</span><span class="w"> </span><span class="s">sources</span><span class="w"> </span><span class="s2">&quot;${PROJECT_SOURCE_DIR}/src/main.c&quot;</span><span class="p">)</span> <span class="nb">file</span><span class="p">(</span><span class="s">GLOB</span><span class="w"> </span><span class="s">tests</span><span class="w"> </span><span class="s2">&quot;${PROJECT_SOURCE_DIR}/test/*.cpp&quot;</span><span class="p">)</span> <span class="nb">list</span><span class="p">(</span><span class="s">REMOVE_ITEM</span><span class="w"> </span><span class="s">tests</span><span class="w"> </span><span class="s2">&quot;${PROJECT_SOURCE_DIR}/test/main.cpp&quot;</span><span class="p">)</span> <span class="nb">foreach</span><span class="p">(</span><span class="s">file</span><span class="w"> </span><span class="o">${</span><span class="nv">tests</span><span class="o">}</span><span class="p">)</span> <span class="w"> </span><span class="nb">set</span><span class="p">(</span><span class="s">name</span><span class="p">)</span> <span class="w"> </span><span class="nb">get_filename_component</span><span class="p">(</span><span class="s">name</span><span class="w"> </span><span class="o">${</span><span class="nv">file</span><span class="o">}</span><span class="w"> </span><span class="s">NAME_WE</span><span class="p">)</span> <span class="w"> </span><span class="nb">add_executable</span><span class="p">(</span><span class="s2">&quot;${name}_tests&quot;</span> <span class="w"> </span><span class="o">${</span><span class="nv">sources</span><span class="o">}</span> <span class="w"> </span><span class="o">${</span><span class="nv">file</span><span class="o">}</span> <span class="w"> </span><span class="s2">&quot;${PROJECT_SOURCE_DIR}/test/main.cpp&quot;</span><span class="p">)</span> <span class="w"> </span><span class="nb">target_link_libraries</span><span class="p">(</span><span class="s2">&quot;${name}_tests&quot;</span><span class="w"> </span><span class="s">gtest_main</span><span class="p">)</span> <span class="w"> </span><span class="nb">add_test</span><span class="p">(</span><span class="s">NAME</span><span class="w"> </span><span class="o">${</span><span class="nv">name</span><span class="o">}</span><span class="w"> </span><span class="s">COMMAND</span><span class="w"> </span><span class="s2">&quot;${name}_tests&quot;</span><span class="p">)</span> <span class="nb">endforeach</span><span class="p">()</span> </pre></div> <p>We have to register a test for each file otherwise each file's tests won't show up by default (i.e. without a <code>--verbose</code> flag).</p> <h4 id="building-and-running-tests">Building and running tests</h4><p>Similar to building and running the source, we run CMake in a subdirectory but run <code>make test</code> or <code>ctest</code> after building all sources and tests with <code>make</code>.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>build $<span class="w"> </span>cmake<span class="w"> </span>.. --<span class="w"> </span>Configuring<span class="w"> </span><span class="k">done</span> --<span class="w"> </span>Generating<span class="w"> </span><span class="k">done</span> --<span class="w"> </span>Build<span class="w"> </span>files<span class="w"> </span>have<span class="w"> </span>been<span class="w"> </span>written<span class="w"> </span>to:<span class="w"> </span>/Users/philipeaton/tmp/testy/build/googletest-download Scanning<span class="w"> </span>dependencies<span class="w"> </span>of<span class="w"> </span>target<span class="w"> </span>googletest <span class="o">[</span><span class="w"> </span><span class="m">11</span>%<span class="o">]</span><span class="w"> </span>Creating<span class="w"> </span>directories<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="s1">&#39;googletest&#39;</span> <span class="o">[</span><span class="w"> </span><span class="m">22</span>%<span class="o">]</span><span class="w"> </span>Performing<span class="w"> </span>download<span class="w"> </span>step<span class="w"> </span><span class="o">(</span>git<span class="w"> </span>clone<span class="o">)</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="s1">&#39;googletest&#39;</span> Cloning<span class="w"> </span>into<span class="w"> </span><span class="s1">&#39;googletest-src&#39;</span>... Note:<span class="w"> </span>checking<span class="w"> </span>out<span class="w"> </span><span class="s1">&#39;release-1.8.1&#39;</span>. You<span class="w"> </span>are<span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="s1">&#39;detached HEAD&#39;</span><span class="w"> </span>state.<span class="w"> </span>You<span class="w"> </span>can<span class="w"> </span>look<span class="w"> </span>around,<span class="w"> </span>make<span class="w"> </span>experimental changes<span class="w"> </span>and<span class="w"> </span>commit<span class="w"> </span>them,<span class="w"> </span>and<span class="w"> </span>you<span class="w"> </span>can<span class="w"> </span>discard<span class="w"> </span>any<span class="w"> </span>commits<span class="w"> </span>you<span class="w"> </span>make<span class="w"> </span><span class="k">in</span><span class="w"> </span>this state<span class="w"> </span>without<span class="w"> </span>impacting<span class="w"> </span>any<span class="w"> </span>branches<span class="w"> </span>by<span class="w"> </span>performing<span class="w"> </span>another<span class="w"> </span>checkout. If<span class="w"> </span>you<span class="w"> </span>want<span class="w"> </span>to<span class="w"> </span>create<span class="w"> </span>a<span class="w"> </span>new<span class="w"> </span>branch<span class="w"> </span>to<span class="w"> </span>retain<span class="w"> </span>commits<span class="w"> </span>you<span class="w"> </span>create,<span class="w"> </span>you<span class="w"> </span>may <span class="k">do</span><span class="w"> </span>so<span class="w"> </span><span class="o">(</span>now<span class="w"> </span>or<span class="w"> </span>later<span class="o">)</span><span class="w"> </span>by<span class="w"> </span>using<span class="w"> </span>-b<span class="w"> </span>with<span class="w"> </span>the<span class="w"> </span>checkout<span class="w"> </span><span class="nb">command</span><span class="w"> </span>again.<span class="w"> </span>Example: <span class="w"> </span>git<span class="w"> </span>checkout<span class="w"> </span>-b<span class="w"> </span>&lt;new-branch-name&gt; HEAD<span class="w"> </span>is<span class="w"> </span>now<span class="w"> </span>at<span class="w"> </span>2fe3bd99<span class="w"> </span>Merge<span class="w"> </span>pull<span class="w"> </span>request<span class="w"> </span><span class="c1">#1433 from dsacre/fix-clang-warnings</span> <span class="o">[</span><span class="w"> </span><span class="m">33</span>%<span class="o">]</span><span class="w"> </span>No<span class="w"> </span>patch<span class="w"> </span>step<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="s1">&#39;googletest&#39;</span> <span class="o">[</span><span class="w"> </span><span class="m">44</span>%<span class="o">]</span><span class="w"> </span>Performing<span class="w"> </span>update<span class="w"> </span>step<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="s1">&#39;googletest&#39;</span> <span class="o">[</span><span class="w"> </span><span class="m">55</span>%<span class="o">]</span><span class="w"> </span>No<span class="w"> </span>configure<span class="w"> </span>step<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="s1">&#39;googletest&#39;</span> <span class="o">[</span><span class="w"> </span><span class="m">66</span>%<span class="o">]</span><span class="w"> </span>No<span class="w"> </span>build<span class="w"> </span>step<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="s1">&#39;googletest&#39;</span> <span class="o">[</span><span class="w"> </span><span class="m">77</span>%<span class="o">]</span><span class="w"> </span>No<span class="w"> </span>install<span class="w"> </span>step<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="s1">&#39;googletest&#39;</span> <span class="o">[</span><span class="w"> </span><span class="m">88</span>%<span class="o">]</span><span class="w"> </span>No<span class="w"> </span><span class="nb">test</span><span class="w"> </span>step<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="s1">&#39;googletest&#39;</span> <span class="o">[</span><span class="m">100</span>%<span class="o">]</span><span class="w"> </span>Completed<span class="w"> </span><span class="s1">&#39;googletest&#39;</span> <span class="o">[</span><span class="m">100</span>%<span class="o">]</span><span class="w"> </span>Built<span class="w"> </span>target<span class="w"> </span>googletest --<span class="w"> </span>Found<span class="w"> </span>PythonInterp:<span class="w"> </span>/usr/local/bin/python<span class="w"> </span><span class="o">(</span>found<span class="w"> </span>version<span class="w"> </span><span class="s2">&quot;2.7.16&quot;</span><span class="o">)</span> --<span class="w"> </span>Looking<span class="w"> </span><span class="k">for</span><span class="w"> </span>pthread.h --<span class="w"> </span>Looking<span class="w"> </span><span class="k">for</span><span class="w"> </span>pthread.h<span class="w"> </span>-<span class="w"> </span>found --<span class="w"> </span>Performing<span class="w"> </span>Test<span class="w"> </span>CMAKE_HAVE_LIBC_PTHREAD --<span class="w"> </span>Performing<span class="w"> </span>Test<span class="w"> </span>CMAKE_HAVE_LIBC_PTHREAD<span class="w"> </span>-<span class="w"> </span>Success --<span class="w"> </span>Found<span class="w"> </span>Threads:<span class="w"> </span>TRUE --<span class="w"> </span>Configuring<span class="w"> </span><span class="k">done</span> --<span class="w"> </span>Generating<span class="w"> </span><span class="k">done</span> --<span class="w"> </span>Build<span class="w"> </span>files<span class="w"> </span>have<span class="w"> </span>been<span class="w"> </span>written<span class="w"> </span>to:<span class="w"> </span>/Users/philipeaton/tmp/testy/build $<span class="w"> </span>make <span class="o">[</span><span class="w"> </span><span class="m">4</span>%<span class="o">]</span><span class="w"> </span>Building<span class="w"> </span>C<span class="w"> </span>object<span class="w"> </span>CMakeFiles/testy.dir/src/customer.c.o <span class="o">[</span><span class="w"> </span><span class="m">9</span>%<span class="o">]</span><span class="w"> </span>Building<span class="w"> </span>C<span class="w"> </span>object<span class="w"> </span>CMakeFiles/testy.dir/src/widget.c.o <span class="o">[</span><span class="w"> </span><span class="m">13</span>%<span class="o">]</span><span class="w"> </span>Building<span class="w"> </span>C<span class="w"> </span>object<span class="w"> </span>CMakeFiles/testy.dir/src/main.c.o <span class="o">[</span><span class="w"> </span><span class="m">18</span>%<span class="o">]</span><span class="w"> </span>Linking<span class="w"> </span>C<span class="w"> </span>executable<span class="w"> </span>testy <span class="o">[</span><span class="w"> </span><span class="m">18</span>%<span class="o">]</span><span class="w"> </span>Built<span class="w"> </span>target<span class="w"> </span>testy <span class="o">[</span><span class="w"> </span><span class="m">22</span>%<span class="o">]</span><span class="w"> </span>Building<span class="w"> </span>CXX<span class="w"> </span>object<span class="w"> </span>googletest-build/googlemock/gtest/CMakeFiles/gtest.dir/src/gtest-all.cc.o <span class="o">[</span><span class="w"> </span><span class="m">27</span>%<span class="o">]</span><span class="w"> </span>Linking<span class="w"> </span>CXX<span class="w"> </span>static<span class="w"> </span>library<span class="w"> </span>libgtest.a <span class="o">[</span><span class="w"> </span><span class="m">27</span>%<span class="o">]</span><span class="w"> </span>Built<span class="w"> </span>target<span class="w"> </span>gtest <span class="o">[</span><span class="w"> </span><span class="m">31</span>%<span class="o">]</span><span class="w"> </span>Building<span class="w"> </span>CXX<span class="w"> </span>object<span class="w"> </span>googletest-build/googlemock/CMakeFiles/gmock.dir/src/gmock-all.cc.o <span class="o">[</span><span class="w"> </span><span class="m">36</span>%<span class="o">]</span><span class="w"> </span>Linking<span class="w"> </span>CXX<span class="w"> </span>static<span class="w"> </span>library<span class="w"> </span>libgmock.a <span class="o">[</span><span class="w"> </span><span class="m">36</span>%<span class="o">]</span><span class="w"> </span>Built<span class="w"> </span>target<span class="w"> </span>gmock <span class="o">[</span><span class="w"> </span><span class="m">40</span>%<span class="o">]</span><span class="w"> </span>Building<span class="w"> </span>CXX<span class="w"> </span>object<span class="w"> </span>googletest-build/googlemock/CMakeFiles/gmock_main.dir/src/gmock_main.cc.o <span class="o">[</span><span class="w"> </span><span class="m">45</span>%<span class="o">]</span><span class="w"> </span>Linking<span class="w"> </span>CXX<span class="w"> </span>static<span class="w"> </span>library<span class="w"> </span>libgmock_main.a <span class="o">[</span><span class="w"> </span><span class="m">45</span>%<span class="o">]</span><span class="w"> </span>Built<span class="w"> </span>target<span class="w"> </span>gmock_main <span class="o">[</span><span class="w"> </span><span class="m">50</span>%<span class="o">]</span><span class="w"> </span>Building<span class="w"> </span>CXX<span class="w"> </span>object<span class="w"> </span>googletest-build/googlemock/gtest/CMakeFiles/gtest_main.dir/src/gtest_main.cc.o <span class="o">[</span><span class="w"> </span><span class="m">54</span>%<span class="o">]</span><span class="w"> </span>Linking<span class="w"> </span>CXX<span class="w"> </span>static<span class="w"> </span>library<span class="w"> </span>libgtest_main.a <span class="o">[</span><span class="w"> </span><span class="m">54</span>%<span class="o">]</span><span class="w"> </span>Built<span class="w"> </span>target<span class="w"> </span>gtest_main <span class="o">[</span><span class="w"> </span><span class="m">59</span>%<span class="o">]</span><span class="w"> </span>Building<span class="w"> </span>C<span class="w"> </span>object<span class="w"> </span>test/CMakeFiles/customer_tests.dir/__/src/customer.c.o <span class="o">[</span><span class="w"> </span><span class="m">63</span>%<span class="o">]</span><span class="w"> </span>Building<span class="w"> </span>C<span class="w"> </span>object<span class="w"> </span>test/CMakeFiles/customer_tests.dir/__/src/widget.c.o <span class="o">[</span><span class="w"> </span><span class="m">68</span>%<span class="o">]</span><span class="w"> </span>Building<span class="w"> </span>CXX<span class="w"> </span>object<span class="w"> </span>test/CMakeFiles/customer_tests.dir/customer.cpp.o <span class="o">[</span><span class="w"> </span><span class="m">72</span>%<span class="o">]</span><span class="w"> </span>Building<span class="w"> </span>CXX<span class="w"> </span>object<span class="w"> </span>test/CMakeFiles/customer_tests.dir/main.cpp.o <span class="o">[</span><span class="w"> </span><span class="m">77</span>%<span class="o">]</span><span class="w"> </span>Linking<span class="w"> </span>CXX<span class="w"> </span>executable<span class="w"> </span>customer_tests <span class="o">[</span><span class="w"> </span><span class="m">77</span>%<span class="o">]</span><span class="w"> </span>Built<span class="w"> </span>target<span class="w"> </span>customer_tests Scanning<span class="w"> </span>dependencies<span class="w"> </span>of<span class="w"> </span>target<span class="w"> </span>widget_tests <span class="o">[</span><span class="w"> </span><span class="m">81</span>%<span class="o">]</span><span class="w"> </span>Building<span class="w"> </span>C<span class="w"> </span>object<span class="w"> </span>test/CMakeFiles/widget_tests.dir/__/src/customer.c.o <span class="o">[</span><span class="w"> </span><span class="m">86</span>%<span class="o">]</span><span class="w"> </span>Building<span class="w"> </span>C<span class="w"> </span>object<span class="w"> </span>test/CMakeFiles/widget_tests.dir/__/src/widget.c.o <span class="o">[</span><span class="w"> </span><span class="m">90</span>%<span class="o">]</span><span class="w"> </span>Building<span class="w"> </span>CXX<span class="w"> </span>object<span class="w"> </span>test/CMakeFiles/widget_tests.dir/widget.cpp.o <span class="o">[</span><span class="w"> </span><span class="m">95</span>%<span class="o">]</span><span class="w"> </span>Building<span class="w"> </span>CXX<span class="w"> </span>object<span class="w"> </span>test/CMakeFiles/widget_tests.dir/main.cpp.o <span class="o">[</span><span class="m">100</span>%<span class="o">]</span><span class="w"> </span>Linking<span class="w"> </span>CXX<span class="w"> </span>executable<span class="w"> </span>widget_tests <span class="o">[</span><span class="m">100</span>%<span class="o">]</span><span class="w"> </span>Built<span class="w"> </span>target<span class="w"> </span>widget_tests </pre></div> <p>After running <code>cmake</code> and <code>make</code>, we're finally ready to run <code>ctest</code>.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>ctest Test<span class="w"> </span>project<span class="w"> </span>/Users/philipeaton/tmp/testy/build <span class="w"> </span>Start<span class="w"> </span><span class="m">1</span>:<span class="w"> </span>customer <span class="m">1</span>/2<span class="w"> </span>Test<span class="w"> </span><span class="c1">#1: customer .......................... Passed 0.01 sec</span> <span class="w"> </span>Start<span class="w"> </span><span class="m">2</span>:<span class="w"> </span>widget <span class="m">2</span>/2<span class="w"> </span>Test<span class="w"> </span><span class="c1">#2: widget ............................ Passed 0.00 sec</span> <span class="m">100</span>%<span class="w"> </span>tests<span class="w"> </span>passed,<span class="w"> </span><span class="m">0</span><span class="w"> </span>tests<span class="w"> </span>failed<span class="w"> </span>out<span class="w"> </span>of<span class="w"> </span><span class="m">2</span> Total<span class="w"> </span>Test<span class="w"> </span><span class="nb">time</span><span class="w"> </span><span class="o">(</span>real<span class="o">)</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0</span>.01<span class="w"> </span>sec </pre></div> <p>Now we're in a good place with most of the challenges of unit testing C code (i.e. ignoring mocks) past us.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">In preparation for a couple new articles on some C projects, here&#39;s a foundational post on building C code and writing/running unit tests with gtest and cmake <a href="https://t.co/aMVyr7LO73">https://t.co/aMVyr7LO73</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1167826536298405894?ref_src=twsrc%5Etfw">August 31, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/unit-testing-c-code-with-gtest.htmlSat, 31 Aug 2019 00:00:00 +0000Writing an x86 emulator from scratch in JavaScript: 2. system callshttp://notes.eatonphil.com/emulator-basics-system-calls.html<p class="note"> Previously in emulator basics: <! forgive me, for I have sinned > <br /> <a href="/emulator-basics-a-stack-and-register-machine.html">1. a stack and register machine</a> </p><p>In this post we'll extend <a href="https://github.com/eatonphil/x86e">x86e</a> to support the exit and write Linux system calls, or syscalls. A syscall is a function handled by the kernel that allows the process to interact with data outside of its memory. The <code>SYSCALL</code> instruction takes arguments in the same order that the regular <code>CALL</code> instruction does. But <code>SYSCALL</code> additionally requires the <code>RAX</code> register to contain the integer number of the syscall.</p> <p>Historically, there have been a number of different ways to make syscalls. All methods perform variations on a software interrupt. Before AMD64, on x86 processors, there was the <code>SYSENTER</code> instruction. And before that there was only <code>INT 80h</code> to trigger the interrupt with the syscall handler (since interrupts can be used for more than just syscalls). The various instructions around interrupts have been added for efficiency as the processors and use by operating systems evolved.</p> <p>Since this is a general need and AMD64 processors are among the most common today, you'll see similar code in every modern operating system such as FreeBSD, OpenBSD, NetBSD, macOS, and Linux. (I have no background in Windows.) The calling convention may differ (e.g. which arguments are in which registers) and the syscall numbers differ. Even within Linux both the calling convention and the syscall numbers differ between x86 (32-bit) and AMD64/x86_64 (64-bit) versions.</p> <p>See this <a href="https://stackoverflow.com/a/15169141/1507139">StackOverflow post</a> for some more detail.</p> <p><a href="https://gist.github.com/eatonphil/2d16bc3dae33bff8a8d7f2a9d13025c3">Code for this post in full is available as a Gist.</a></p> <h4 id="exit">Exit</h4><p>The exit syscall is how a child process communicates with the process that spawned it (its parent) when the child is finished running. Exit takes one argument, called the exit code or status code. It is an arbitrary signed 8-bit integer. If the high bit is set (i.e. the number is negative), this is interpreted to mean the process exited abnormally such as due to a segfault. Shells additionally interpret any non-zero exit code as a "failure". Otherwise, and ignoring these two common conventions, it can be used to mean anything the programmer wants.</p> <p class="note"> The wait syscall is how the parent process can block until exit is called by the child and receive its exit code. </p><p>On AMD64 Linux the syscall number is 60. For example:</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nf">MOV</span><span class="w"> </span><span class="nb">RDI</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span> <span class="w"> </span><span class="nf">MOV</span><span class="w"> </span><span class="nb">RAX</span><span class="p">,</span><span class="w"> </span><span class="mi">60</span> <span class="w"> </span><span class="nf">SYSCALL</span> </pre></div> <p>This calls exit with a status code of 0.</p> <h4 id="write">Write</h4><p>The write syscall is how a process can send data to file descriptors, which are integers representing some file-like object. By default, a Linux process is given access to three file descriptors with consistent integer values: stdin is 0, stdout is 1, and stderr is 2. Write takes three arguments: the file descriptor integer to write to, a starting address to memory that is interpreted as a byte array, and the number of bytes to write to the file descriptor beginning at the start address.</p> <p>On AMD64 Linux the syscall number is 1. For example:</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nf">MOV</span><span class="w"> </span><span class="nb">RDI</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="c1">; stdout</span> <span class="w"> </span><span class="nf">MOV</span><span class="w"> </span><span class="nb">RSI</span><span class="p">,</span><span class="w"> </span><span class="nb">R12</span><span class="w"> </span><span class="c1">; address of string</span> <span class="w"> </span><span class="nf">MOV</span><span class="w"> </span><span class="nb">RDX</span><span class="p">,</span><span class="w"> </span><span class="mi">8</span><span class="w"> </span><span class="c1">; 8 bytes to write</span> <span class="w"> </span><span class="nf">MOV</span><span class="w"> </span><span class="nb">RAX</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="c1">; write</span> <span class="w"> </span><span class="nf">SYSCALL</span> </pre></div> <p>This writes 8 bytes to stdout starting from the string whose address is in R12.</p> <h3 id="implementing-syscalls">Implementing syscalls</h3><p>Our emulator is simplistic and is currently only implementing process emulation, not full CPU emulation. So the syscalls themselves will be handled in JavaScript. First we'll write out stubs for the two syscalls we are adding. And we'll provide a map from syscall id to the syscall.</p> <div class="highlight"><pre><span></span><span class="kd">const</span><span class="w"> </span><span class="nx">SYSCALLS_BY_ID</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="mf">1</span><span class="o">:</span><span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">sys_write</span><span class="p">(</span><span class="nx">process</span><span class="p">)</span><span class="w"> </span><span class="p">{},</span> <span class="w"> </span><span class="mf">60</span><span class="o">:</span><span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">sys_exit</span><span class="p">(</span><span class="nx">process</span><span class="p">)</span><span class="w"> </span><span class="p">{},</span> <span class="p">};</span> </pre></div> <p>We need to add an instruction handler to our instruction switch. In doing so we must convert the value in <code>RAX</code> from a BigInt to a regular Number so we can look it up in the syscall map.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">&#39;syscall&#39;</span><span class="o">:</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">idNumber</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">Number</span><span class="p">(</span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RAX</span><span class="p">);</span> <span class="w"> </span><span class="nx">SYSCALLS_BY_ID</span><span class="p">[</span><span class="nx">idNumber</span><span class="p">](</span><span class="nx">process</span><span class="p">);</span> <span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RIP</span><span class="o">++</span><span class="p">;</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> </pre></div> <h4 id="exit">Exit</h4><p>Exit is really simple. It will be implemented by calling Node's <code>global.process.exit()</code>. Again we'll need to convert the register's BigInt value to a Number.</p> <div class="highlight"><pre><span></span><span class="kd">const</span><span class="w"> </span><span class="nx">SYSCALLS_BY_ID</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="mf">1</span><span class="o">:</span><span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">sys_write</span><span class="p">(</span><span class="nx">process</span><span class="p">)</span><span class="w"> </span><span class="p">{},</span> <span class="w"> </span><span class="mf">60</span><span class="o">:</span><span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">sys_exit</span><span class="p">(</span><span class="nx">process</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nb">global</span><span class="p">.</span><span class="nx">process</span><span class="p">.</span><span class="nx">exit</span><span class="p">(</span><span class="nb">Number</span><span class="p">(</span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RDI</span><span class="p">));</span> <span class="w"> </span><span class="p">},</span> <span class="p">};</span> </pre></div> <h4 id="write">Write</h4><p>Write will be implemented by iterating over the process memory as bytes and by calling <code>write()</code> on the relevant file descriptor. We'll store a map of these on the process object and supply stdout, stderr, and stdin proxies on startup.</p> <div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">main</span><span class="p">(</span><span class="nx">file</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">...</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">process</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">registers</span><span class="p">,</span> <span class="w"> </span><span class="nx">memory</span><span class="p">,</span> <span class="w"> </span><span class="nx">instructions</span><span class="p">,</span> <span class="w"> </span><span class="nx">labels</span><span class="p">,</span> <span class="w"> </span><span class="nx">fd</span><span class="o">:</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// stdout</span> <span class="w"> </span><span class="mf">1</span><span class="o">:</span><span class="w"> </span><span class="nb">global</span><span class="p">.</span><span class="nx">process</span><span class="p">.</span><span class="nx">stdout</span><span class="p">,</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">...</span> <span class="p">}</span> </pre></div> <p>The base address is stored in <code>RSI</code>, the number of bytes to write are stored in <code>RDX</code>. And the file descriptor to write to is stored in <code>RDI</code>.</p> <div class="highlight"><pre><span></span><span class="kd">const</span><span class="w"> </span><span class="nx">SYSCALLS_BY_ID</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="mf">1</span><span class="o">:</span><span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">sys_write</span><span class="p">(</span><span class="nx">process</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">msg</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">BigInt</span><span class="p">(</span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RSI</span><span class="p">);</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">bytes</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">Number</span><span class="p">(</span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RDX</span><span class="p">);</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">let</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="nx">bytes</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="kr">byte</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">readMemoryBytes</span><span class="p">(</span><span class="nx">process</span><span class="p">,</span><span class="w"> </span><span class="nx">msg</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nb">BigInt</span><span class="p">(</span><span class="nx">i</span><span class="p">),</span><span class="w"> </span><span class="mf">1</span><span class="p">);</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="kr">char</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">String</span><span class="p">.</span><span class="nx">fromCharCode</span><span class="p">(</span><span class="nb">Number</span><span class="p">(</span><span class="kr">byte</span><span class="p">));</span> <span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">fd</span><span class="p">[</span><span class="nb">Number</span><span class="p">(</span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RDI</span><span class="p">)].</span><span class="nx">write</span><span class="p">(</span><span class="kr">char</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="p">...</span> </pre></div> <h3 id="all-together">All together</h3><div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>exit3.asm main: <span class="w"> </span>MOV<span class="w"> </span>RDI,<span class="w"> </span><span class="m">1</span> <span class="w"> </span>MOV<span class="w"> </span>RSI,<span class="w"> </span><span class="m">2</span> <span class="w"> </span>ADD<span class="w"> </span>RDI,<span class="w"> </span>RSI <span class="w"> </span>MOV<span class="w"> </span>RAX,<span class="w"> </span><span class="m">60</span><span class="w"> </span><span class="p">;</span><span class="w"> </span><span class="nb">exit</span> <span class="w"> </span>SYSCALL $<span class="w"> </span>node<span class="w"> </span>emulator.js<span class="w"> </span>exit3.asm $<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="nv">$?</span> <span class="m">3</span> </pre></div> <div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>hello.asm main: <span class="w"> </span>PUSH<span class="w"> </span><span class="m">10</span><span class="w"> </span><span class="p">;</span><span class="w"> </span><span class="se">\n</span> <span class="w"> </span>PUSH<span class="w"> </span><span class="m">33</span><span class="w"> </span><span class="p">;</span><span class="w"> </span>! <span class="w"> </span>PUSH<span class="w"> </span><span class="m">111</span><span class="w"> </span><span class="p">;</span><span class="w"> </span>o <span class="w"> </span>PUSH<span class="w"> </span><span class="m">108</span><span class="w"> </span><span class="p">;</span><span class="w"> </span>l <span class="w"> </span>PUSH<span class="w"> </span><span class="m">108</span><span class="w"> </span><span class="p">;</span><span class="w"> </span>l <span class="w"> </span>PUSH<span class="w"> </span><span class="m">101</span><span class="w"> </span><span class="p">;</span><span class="w"> </span>e <span class="w"> </span>PUSH<span class="w"> </span><span class="m">72</span><span class="w"> </span><span class="p">;</span><span class="w"> </span>H <span class="w"> </span>MOV<span class="w"> </span>RDI,<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="p">;</span><span class="w"> </span>stdout <span class="w"> </span>MOV<span class="w"> </span>RSI,<span class="w"> </span>RSP<span class="w"> </span><span class="p">;</span><span class="w"> </span>address<span class="w"> </span>of<span class="w"> </span>string <span class="w"> </span>MOV<span class="w"> </span>RDX,<span class="w"> </span><span class="m">56</span><span class="w"> </span><span class="p">;</span><span class="w"> </span><span class="m">7</span><span class="w"> </span><span class="m">8</span>-bit<span class="w"> </span>characters<span class="w"> </span><span class="k">in</span><span class="w"> </span>the<span class="w"> </span>string<span class="w"> </span>but<span class="w"> </span>PUSH<span class="w"> </span>acts<span class="w"> </span>on<span class="w"> </span><span class="m">64</span>-bit<span class="w"> </span>integers <span class="w"> </span>MOV<span class="w"> </span>RAX,<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="p">;</span><span class="w"> </span>write <span class="w"> </span>SYSCALL <span class="w"> </span>MOV<span class="w"> </span>RDI,<span class="w"> </span><span class="m">0</span> <span class="w"> </span>MOV<span class="w"> </span>RAX,<span class="w"> </span><span class="m">60</span> <span class="w"> </span>SYSCALL $<span class="w"> </span>node<span class="w"> </span>emulator.js<span class="w"> </span>hello.asm Hello! $ </pre></div> <h3 id="next-steps">Next steps</h3><p>We still aren't setting flags appropriately to support conditionals, so that's low-hanging fruit. There are some other fun syscalls to implement that would also give us access to an emulated VGA card so we could render graphics. Syntactic support for string constants would also be convenient and more efficient.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Latest post in the emulator basics series up: implementing some syscalls starting with sys_exit and sys_write so we can print a nice hello message. <a href="https://t.co/NEfId0lnJx">https://t.co/NEfId0lnJx</a> <a href="https://twitter.com/hashtag/javascript?src=hash&amp;ref_src=twsrc%5Etfw">#javascript</a> <a href="https://twitter.com/hashtag/x86?src=hash&amp;ref_src=twsrc%5Etfw">#x86</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1152689255900176386?ref_src=twsrc%5Etfw">July 20, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/emulator-basics-system-calls.htmlSat, 20 Jul 2019 00:00:00 +0000Writing a lisp compiler from scratch in JavaScript: 6. LLVM system callshttp://notes.eatonphil.com/compiler-basics-llvm-system-calls.html<p class="note"> Previously in compiler basics: <! forgive me, for I have sinned > <br /> <a href="/compiler-basics-lisp-to-assembly.html">1. lisp to assembly</a> <br /> <a href="/compiler-basics-functions.html">2. user-defined functions and variables</a> <br /> <a href="/compiler-basics-llvm.html">3. LLVM</a> <br /> <a href="/compiler-basics-llvm-conditionals.html">4. LLVM conditionals and compiling fibonacci</a> <br /> Next in compiler basics: <br /> <a href="/compiler-basics-an-x86-upgrade.html">5. an x86 upgrade</a> </p><p>In this post we'll extend the <a href="https://github.com/eatonphil/ulisp">ulisp compiler</a>'s LLVM backend to support printing integers to stdout.</p> <h3 id="exit-code-limitations">Exit code limitations</h3><p>Until now we've validated program state by setting the exit code to the result of the program computation. But the exit code is an eight bit integer. What if we want to validate a computation that produces a result larger than 255?</p> <p>To do this we need a way to print integers. This is challenging because printing normally deals with byte arrays. libc's <code>printf</code>, for example, takes a byte array as its first argument.</p> <p>The shortest path forward is to add support for system calls so we can print one character at a time. Here's a version of a <code>print</code> form that hacks around not having arrays to send each integer in a number to stdout.</p> <div class="highlight"><pre><span></span><span class="p">(</span><span class="nv">def</span><span class="w"> </span><span class="nv">print-char</span><span class="w"> </span><span class="p">(</span><span class="nv">c</span><span class="p">)</span> <span class="w"> </span><span class="c1">; First argument is stdout</span> <span class="w"> </span><span class="c1">; Second argument is a pointer to a char array (of length one)</span> <span class="w"> </span><span class="c1">; Third argument is the length of the char array</span> <span class="w"> </span><span class="p">(</span><span class="nv">syscall/sys_write</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="nv">&amp;c</span><span class="w"> </span><span class="mi">1</span><span class="p">))</span> <span class="p">(</span><span class="nv">def</span><span class="w"> </span><span class="nb">print</span><span class="w"> </span><span class="p">(</span><span class="nv">n</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nb">&gt;</span><span class="w"> </span><span class="nv">n</span><span class="w"> </span><span class="mi">9</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="nb">print</span><span class="w"> </span><span class="p">(</span><span class="nb">/</span><span class="w"> </span><span class="nv">n</span><span class="w"> </span><span class="mi">10</span><span class="p">)))</span> <span class="w"> </span><span class="c1">; 48 is the ASCII code for &#39;0&#39;</span> <span class="w"> </span><span class="p">(</span><span class="nv">print-char</span><span class="w"> </span><span class="p">(</span><span class="nb">+</span><span class="w"> </span><span class="mi">48</span><span class="w"> </span><span class="p">(</span><span class="nv">%</span><span class="w"> </span><span class="nv">n</span><span class="w"> </span><span class="mi">10</span><span class="p">))))</span> </pre></div> <p>In order to support this we need to add the <code>syscall/sys_write</code>, <code>></code>, <code>%</code>, and <code>/</code> builtin forms. We'll also need to add support for taking the address of a variable.</p> <p>All <a href="https://github.com/eatonphil/ulisp">code is available on Github</a> as is the <a href="https://github.com/eatonphil/ulisp/commit/213b83b8e952c210ba408bf38e59ae677d19e643">particular commit related to this post</a>.</p> <h3 id="references">References</h3><p>The <code>sys_write</code> syscall requires us to pass the memory address of the byte array to write. We don't support arrays, but we can treat an individual variable as an array of length one by passing the variable's address.</p> <p>If we were compiling to C we could just pass the address of a local variable. But LLVM doesn't allow us to take the address of variables directly. We need to push the variable onto the LLVM stack to get an address.</p> <p class="note"> Under the hood LLVM will likely optimize this into a local variable reference instead of first pushing to the stack. </p><p>Since LLVM IR is typed, the value representing the address of a local variable will be a pointer type. We'll need to refer to all uses of this value as a pointer. So we'll need to modify ulisp to track local types rather than hard-coding <code>i64</code> everywhere.</p> <h4 id="scope">Scope</h4><p>To begin we'll modify the <code>Scope</code> class to track types. We only need to do this on registration. Afterward, we'll have to find all uses of local variables to make sure they use the local's <code>value</code> and <code>type</code> fields appropriately.</p> <div class="highlight"><pre><span></span><span class="kd">class</span><span class="w"> </span><span class="nx">Scope</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">...</span> <span class="w"> </span><span class="nx">register</span><span class="p">(</span><span class="nx">local</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">copy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">local</span><span class="p">.</span><span class="nx">replace</span><span class="p">(</span><span class="s1">&#39;-&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;_&#39;</span><span class="p">);</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">1</span><span class="p">;</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">locals</span><span class="p">[</span><span class="nx">copy</span><span class="p">])</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">copy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">local</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">n</span><span class="o">++</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">locals</span><span class="p">[</span><span class="nx">local</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">value</span><span class="o">:</span><span class="w"> </span><span class="nx">copy</span><span class="p">,</span> <span class="w"> </span><span class="nx">type</span><span class="o">:</span><span class="w"> </span><span class="s1">&#39;i64&#39;</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">locals</span><span class="p">[</span><span class="nx">local</span><span class="p">];</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">...</span> <span class="p">}</span> </pre></div> <p>We won't go through every use of a <code>Scope</code> variable in this post, but you can find it in the related <a href="https://github.com/eatonphil/ulisp/commit/213b83b8e952c210ba408bf38e59ae677d19e643">commit to ulisp</a>.</p> <h4 id="reference">Reference</h4><p>The long-term approach for handling a reference syntactically is probably to rewrite <code>&x</code> to <code>(& x)</code> in the parser. The lazy approach we'll take for now is to handle a reference as a special kind of identifier in <code>compileExpression</code>.</p> <p>We'll use the LLVM <code>alloca</code> instruction to create space on the stack. This will return a pointer and will turn the destination variable into a pointer type. Then we'll use <code>store</code> to set the value at the address to the current value of the variable being referenced.</p> <div class="highlight"><pre><span></span><span class="kd">class</span><span class="w"> </span><span class="nx">Compiler</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">...</span> <span class="w"> </span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">exp</span><span class="p">,</span><span class="w"> </span><span class="nx">destination</span><span class="p">,</span><span class="w"> </span><span class="nx">context</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">...</span> <span class="w"> </span><span class="c1">// Is a reference, push onto the stack and return its address</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">exp</span><span class="p">.</span><span class="nx">startsWith</span><span class="p">(</span><span class="s1">&#39;&amp;&#39;</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">symbol</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">exp</span><span class="p">.</span><span class="nx">substring</span><span class="p">(</span><span class="mf">1</span><span class="p">);</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">tmp</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">context</span><span class="p">.</span><span class="nx">scope</span><span class="p">.</span><span class="nx">symbol</span><span class="p">();</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">symbol</span><span class="p">,</span><span class="w"> </span><span class="nx">tmp</span><span class="p">,</span><span class="w"> </span><span class="nx">context</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`%</span><span class="si">${</span><span class="nx">destination</span><span class="p">.</span><span class="nx">value</span><span class="si">}</span><span class="sb"> = alloca </span><span class="si">${</span><span class="nx">tmp</span><span class="p">.</span><span class="nx">type</span><span class="si">}</span><span class="sb">, align 4`</span><span class="p">);</span> <span class="w"> </span><span class="nx">destination</span><span class="p">.</span><span class="nx">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">tmp</span><span class="p">.</span><span class="nx">type</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s1">&#39;*&#39;</span><span class="p">;</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`store </span><span class="si">${</span><span class="nx">tmp</span><span class="p">.</span><span class="nx">type</span><span class="si">}</span><span class="sb"> %</span><span class="si">${</span><span class="nx">tmp</span><span class="p">.</span><span class="nx">value</span><span class="si">}</span><span class="sb">, </span><span class="si">${</span><span class="nx">destination</span><span class="p">.</span><span class="nx">type</span><span class="si">}</span><span class="sb"> %</span><span class="si">${</span><span class="nx">destination</span><span class="p">.</span><span class="nx">value</span><span class="si">}</span><span class="sb">, align 4`</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">...</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">...</span> <span class="p">}</span> </pre></div> <p>And now we're set to take the address of any code.</p> <h3 id="system-calls">System calls</h3><p>LLVM IR provides no high-level means for making system calls. The only way is to use inline assembly. This syntax is based on GCC inline assembly and is confusing, with few explained examples, and unhelpful error messages.</p> <p>Thankfully the assembly code needed for a syscall is only one line, one word: the <code>syscall</code> assembly instruction. We use inline assembly variable-to-register mapping functionality to line up all the parameters for the syscall. Here is an example:</p> <div class="highlight"><pre><span></span><span class="nv">%result</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">call</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="k">asm</span><span class="w"> </span><span class="k">sideeffect</span><span class="w"> </span><span class="s">&quot;syscall&quot;</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;=r,{rax},{rdi},{rsi},{rdx}&quot;</span><span class="w"> </span><span class="p">(</span><span class="kt">i64</span><span class="w"> </span><span class="nv">%raxArg</span><span class="p">,</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="nv">%rdiArg</span><span class="p">,</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="nv">%rsiArg</span><span class="p">,</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="nv">%rdxArg</span><span class="p">)</span> </pre></div> <p>This says to execute the inline assembly string, "syscall". The <code>sideeffect</code> flag means that this assembly should always be run even if the result isn't used. <code>=r</code> means the inline assembly returns a value, and the rest of the string is the list of registers that arguments should be mapped to. Finally we call the function with all the LLVM variables we want to be mapped.</p> <p class="note"> Eventually we should also use the inline assembly syntax to list registers that are modified so that LLVM can know to save them before and after. </p><h4 id="code">Code</h4><p>We'll add a mapping for <code>syscall/sys_write</code> and a helper function for generating syscall code using the example above as a template. We'll suport 64-bit Darwin and Linux kernels.</p> <div class="highlight"><pre><span></span><span class="kd">const</span><span class="w"> </span><span class="nx">SYSCALL_TABLE</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">darwin</span><span class="o">:</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">sys_write</span><span class="o">:</span><span class="w"> </span><span class="mh">0x2000004</span><span class="p">,</span> <span class="w"> </span><span class="nx">sys_exit</span><span class="o">:</span><span class="w"> </span><span class="mh">0x2000001</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span> <span class="w"> </span><span class="nx">linux</span><span class="o">:</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">sys_write</span><span class="o">:</span><span class="w"> </span><span class="mf">1</span><span class="p">,</span> <span class="w"> </span><span class="nx">sys_exit</span><span class="o">:</span><span class="w"> </span><span class="mf">60</span><span class="p">,</span> <span class="w"> </span><span class="p">},</span> <span class="p">}[</span><span class="nx">process</span><span class="p">.</span><span class="nx">platform</span><span class="p">];</span> <span class="kd">class</span><span class="w"> </span><span class="nx">Compiler</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">constructor</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">outBuffer</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[];</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">primitiveFunctions</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">def</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileDefine</span><span class="p">.</span><span class="nx">bind</span><span class="p">(</span><span class="k">this</span><span class="p">),</span> <span class="w"> </span><span class="nx">begin</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileBegin</span><span class="p">.</span><span class="nx">bind</span><span class="p">(</span><span class="k">this</span><span class="p">),</span> <span class="w"> </span><span class="s1">&#39;if&#39;</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileIf</span><span class="p">.</span><span class="nx">bind</span><span class="p">(</span><span class="k">this</span><span class="p">),</span> <span class="w"> </span><span class="s1">&#39;+&#39;</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">&#39;add&#39;</span><span class="p">),</span> <span class="w"> </span><span class="s1">&#39;-&#39;</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">&#39;sub&#39;</span><span class="p">),</span> <span class="w"> </span><span class="s1">&#39;*&#39;</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">&#39;mul&#39;</span><span class="p">),</span> <span class="w"> </span><span class="s1">&#39;%&#39;</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">&#39;urem&#39;</span><span class="p">),</span> <span class="w"> </span><span class="s1">&#39;&lt;&#39;</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">&#39;icmp slt&#39;</span><span class="p">),</span> <span class="w"> </span><span class="s1">&#39;=&#39;</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">&#39;icmp eq&#39;</span><span class="p">),</span> <span class="w"> </span><span class="s1">&#39;syscall/sys_write&#39;</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileSyscall</span><span class="p">(</span><span class="nx">SYSCALL_TABLE</span><span class="p">.</span><span class="nx">sys_write</span><span class="p">),</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">...</span> <span class="w"> </span><span class="nx">compileSyscall</span><span class="p">(</span><span class="nx">id</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">(</span><span class="nx">args</span><span class="p">,</span><span class="w"> </span><span class="nx">destination</span><span class="p">,</span><span class="w"> </span><span class="nx">context</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">argTmps</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">args</span><span class="p">.</span><span class="nx">map</span><span class="p">((</span><span class="nx">arg</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">tmp</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">context</span><span class="p">.</span><span class="nx">scope</span><span class="p">.</span><span class="nx">symbol</span><span class="p">();</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">tmp</span><span class="p">,</span><span class="w"> </span><span class="nx">context</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">tmp</span><span class="p">.</span><span class="nx">type</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s1">&#39; %&#39;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">tmp</span><span class="p">.</span><span class="nx">value</span><span class="p">;</span> <span class="w"> </span><span class="p">}).</span><span class="nx">join</span><span class="p">(</span><span class="s1">&#39;, &#39;</span><span class="p">);</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">regs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span><span class="s1">&#39;rdi&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;rsi&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;rdx&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;r10&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;r8&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;r9&#39;</span><span class="p">];</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">params</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">args</span><span class="p">.</span><span class="nx">map</span><span class="p">((</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="sb">`{</span><span class="si">${</span><span class="nx">regs</span><span class="p">[</span><span class="nx">i</span><span class="p">]</span><span class="si">}</span><span class="sb">}`</span><span class="p">).</span><span class="nx">join</span><span class="p">(</span><span class="s1">&#39;,&#39;</span><span class="p">);</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">idTmp</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">context</span><span class="p">.</span><span class="nx">scope</span><span class="p">.</span><span class="nx">symbol</span><span class="p">().</span><span class="nx">value</span><span class="p">;</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`%</span><span class="si">${</span><span class="nx">idTmp</span><span class="si">}</span><span class="sb"> = add i64 </span><span class="si">${</span><span class="nx">id</span><span class="si">}</span><span class="sb">, 0`</span><span class="p">)</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`%</span><span class="si">${</span><span class="nx">destination</span><span class="p">.</span><span class="nx">value</span><span class="si">}</span><span class="sb"> = call </span><span class="si">${</span><span class="nx">destination</span><span class="p">.</span><span class="nx">type</span><span class="si">}</span><span class="sb"> asm sideeffect &quot;syscall&quot;, &quot;=r,{rax},</span><span class="si">${</span><span class="nx">params</span><span class="si">}</span><span class="sb">,~{dirflag},~{fpsr},~{flags}&quot; (i64 %</span><span class="si">${</span><span class="nx">idTmp</span><span class="si">}</span><span class="sb">, </span><span class="si">${</span><span class="nx">argTmps</span><span class="si">}</span><span class="sb">)`</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <h3 id="<code>></code>,-<code>/</code>"><code>></code>, <code>/</code></h3><p>Finally, we have a few new operations to add support for. But they'll be pretty simple using the <code>compileOp</code> helper function.</p> <div class="highlight"><pre><span></span><span class="kd">class</span><span class="w"> </span><span class="nx">Compiler</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">constructor</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">outBuffer</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[];</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">primitiveFunctions</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">def</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileDefine</span><span class="p">.</span><span class="nx">bind</span><span class="p">(</span><span class="k">this</span><span class="p">),</span> <span class="w"> </span><span class="nx">begin</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileBegin</span><span class="p">.</span><span class="nx">bind</span><span class="p">(</span><span class="k">this</span><span class="p">),</span> <span class="w"> </span><span class="s1">&#39;if&#39;</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileIf</span><span class="p">.</span><span class="nx">bind</span><span class="p">(</span><span class="k">this</span><span class="p">),</span> <span class="w"> </span><span class="s1">&#39;+&#39;</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">&#39;add&#39;</span><span class="p">),</span> <span class="w"> </span><span class="s1">&#39;-&#39;</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">&#39;sub&#39;</span><span class="p">),</span> <span class="w"> </span><span class="s1">&#39;*&#39;</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">&#39;mul&#39;</span><span class="p">),</span> <span class="w"> </span><span class="s1">&#39;/&#39;</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">&#39;udiv&#39;</span><span class="p">),</span> <span class="w"> </span><span class="s1">&#39;%&#39;</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">&#39;urem&#39;</span><span class="p">),</span> <span class="w"> </span><span class="s1">&#39;&lt;&#39;</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">&#39;icmp slt&#39;</span><span class="p">),</span> <span class="w"> </span><span class="s1">&#39;&gt;&#39;</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">&#39;icmp sgt&#39;</span><span class="p">),</span> <span class="w"> </span><span class="s1">&#39;=&#39;</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">&#39;icmp eq&#39;</span><span class="p">),</span> <span class="w"> </span><span class="s1">&#39;syscall/sys_write&#39;</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileSyscall</span><span class="p">(</span><span class="nx">SYSCALL_TABLE</span><span class="p">.</span><span class="nx">sys_write</span><span class="p">),</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">...</span> <span class="p">}</span> </pre></div> <h3 id="print">print</h3><p>We're ready to give our print function a shot.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>test.lisp <span class="o">(</span>def<span class="w"> </span>print-char<span class="w"> </span><span class="o">(</span>c<span class="o">)</span> <span class="w"> </span><span class="p">;</span><span class="w"> </span>First<span class="w"> </span>argument<span class="w"> </span>is<span class="w"> </span>stdout <span class="w"> </span><span class="p">;</span><span class="w"> </span>Second<span class="w"> </span>argument<span class="w"> </span>is<span class="w"> </span>a<span class="w"> </span>pointer<span class="w"> </span>to<span class="w"> </span>a<span class="w"> </span>char<span class="w"> </span>array<span class="w"> </span><span class="o">(</span>of<span class="w"> </span>length<span class="w"> </span>one<span class="o">)</span> <span class="w"> </span><span class="p">;</span><span class="w"> </span>Third<span class="w"> </span>argument<span class="w"> </span>is<span class="w"> </span>the<span class="w"> </span>length<span class="w"> </span>of<span class="w"> </span>the<span class="w"> </span>char<span class="w"> </span>array <span class="w"> </span><span class="o">(</span>syscall/sys_write<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="p">&amp;</span>c<span class="w"> </span><span class="m">1</span><span class="o">))</span> <span class="o">(</span>def<span class="w"> </span>print<span class="w"> </span><span class="o">(</span>n<span class="o">)</span> <span class="w"> </span><span class="o">(</span><span class="k">if</span><span class="w"> </span><span class="o">(</span>&gt;<span class="w"> </span>n<span class="w"> </span><span class="m">9</span><span class="o">)</span> <span class="w"> </span><span class="o">(</span>print<span class="w"> </span><span class="o">(</span>/<span class="w"> </span>n<span class="w"> </span><span class="m">10</span><span class="o">)))</span> <span class="w"> </span><span class="p">;</span><span class="w"> </span><span class="m">48</span><span class="w"> </span>is<span class="w"> </span>the<span class="w"> </span>ASCII<span class="w"> </span>code<span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="s1">&#39;0&#39;</span> <span class="w"> </span><span class="o">(</span>print-char<span class="w"> </span><span class="o">(</span>+<span class="w"> </span><span class="m">48</span><span class="w"> </span><span class="o">(</span>%<span class="w"> </span>n<span class="w"> </span><span class="m">10</span><span class="o">))))</span> <span class="o">(</span>def<span class="w"> </span>main<span class="w"> </span><span class="o">()</span> <span class="w"> </span><span class="o">(</span>print<span class="w"> </span><span class="m">1234</span><span class="o">)</span> <span class="w"> </span><span class="m">0</span><span class="o">)</span> $<span class="w"> </span>node<span class="w"> </span>ulisp.js<span class="w"> </span>test.lisp $<span class="w"> </span>./build/a.out <span class="m">1234</span> </pre></div> <p>Looks good! In the next post we'll talk about tail call elimination.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">It&#39;s been a slow month for the blog. But new post on compiler basics is up! Printing integers to stdout and making syscalls in LLVM (all without arrays). This was a pre-req for playing with tail-call elimination (post coming soon) <a href="https://t.co/fDtblUZRI8">https://t.co/fDtblUZRI8</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1142808835700252678?ref_src=twsrc%5Etfw">June 23, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/compiler-basics-llvm-system-calls.htmlSat, 22 Jun 2019 00:00:00 +0000Writing an x86 emulator from scratch in JavaScript: 1. a stack and register machinehttp://notes.eatonphil.com/emulator-basics-a-stack-and-register-machine.html<p class="note"> Better yet, take a look at this post walking through emulating x86 ELF binaries in Go: <br /> <a href="/emulating-amd64-starting-with-elf.html">Emulating linux/AMD64 userland: interpreting an ELF binary</a> <br /> <br /> Next up in emulator basics: <! forgive me, for I have sinned > <br /> <a href="/emulator-basics-system-calls.html">2. system calls</a> </p><p>In this post we'll create a small virtual machine in JavaScript and use it to run a simple C program compiled with GCC for an x86_64 (or AMD64) CPU running Linux.</p> <p><a href="https://github.com/eatonphil/x86e">All source code is available on Github.</a></p> <h3 id="virtual-machine-data-storage">Virtual machine data storage</h3><p>Our virtual machine will have two means of storing data: registers and an integer stack. Each register can store a 64-bit integer. The stack is an array of 8-bit (or 1 byte) integers.</p> <p>We'll make the following registers available for modification and use by the program(mer):</p> <div class="highlight"><pre><span></span><span class="nf">RDI</span><span class="p">,</span><span class="w"> </span><span class="nb">RSI</span><span class="p">,</span><span class="w"> </span><span class="nb">RSP</span><span class="p">,</span><span class="w"> </span><span class="nb">RBP</span><span class="p">,</span><span class="w"> </span><span class="nb">RAX</span><span class="p">,</span><span class="w"> </span><span class="nb">RBX</span><span class="p">,</span><span class="w"> </span><span class="nb">RCX</span><span class="p">,</span><span class="w"> </span><span class="nb">RDX</span><span class="p">,</span><span class="w"> </span><span class="nb">R8</span><span class="p">,</span><span class="w"> </span><span class="nb">R9</span><span class="p">,</span><span class="w"> </span><span class="nb">R10</span><span class="p">,</span><span class="w"> </span><span class="nb">R11</span><span class="p">,</span><span class="w"> </span><span class="nb">R12</span><span class="p">,</span><span class="w"> </span><span class="nb">R13</span><span class="p">,</span><span class="w"> </span><span class="nb">R14</span><span class="p">,</span><span class="w"> </span><span class="nb">R15</span> </pre></div> <p>The <code>RSP</code> register is used by the virtual machine for tracking the location of the last entry in the stack. It will be modified by the virtual machine when it encounters the <code>POP</code>, <code>PUSH</code>, <code>CALL</code> and <code>RET</code> instructions we'll support. We'll get into the specifics shortly.</p> <p>And we'll make the following registers available for use (but not modification) by the program(mer):</p> <div class="highlight"><pre><span></span><span class="nf">RIP</span><span class="p">,</span><span class="w"> </span><span class="nb">CS</span><span class="p">,</span><span class="w"> </span><span class="nb">DS</span><span class="p">,</span><span class="w"> </span><span class="nb">FS</span><span class="p">,</span><span class="w"> </span><span class="nb">SS</span><span class="p">,</span><span class="w"> </span><span class="nb">ES</span><span class="p">,</span><span class="w"> </span><span class="nb">GS</span><span class="p">,</span><span class="w"> </span><span class="nv">CF</span><span class="p">,</span><span class="w"> </span><span class="nv">ZF</span><span class="p">,</span><span class="w"> </span><span class="nv">PF</span><span class="p">,</span><span class="w"> </span><span class="nv">AF</span><span class="p">,</span><span class="w"> </span><span class="nv">SF</span><span class="p">,</span><span class="w"> </span><span class="nv">TF</span><span class="p">,</span><span class="w"> </span><span class="nv">IF</span><span class="p">,</span><span class="w"> </span><span class="nv">DF</span><span class="p">,</span><span class="w"> </span><span class="nv">OF</span> </pre></div> <p>Each of these has a special meaning but we'll focus on <code>RIP</code>. The <code>RIP</code> register contains the address of the instruction currently being interpreted by our virtual machine. After every instruction the virtual machine will increment the value in this register -- except for a few special instructions like <code>CALL</code> and <code>RET</code>.</p> <h4 id="memory-addresses">Memory addresses</h4><p>It will become useful to provide direct access to memory with a special syntax. We'll focus just on 64-bit addresses that will look like this:</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nf">MOV</span><span class="w"> </span><span class="kt">QWORD</span><span class="w"> </span><span class="nv">PTR</span><span class="w"> </span><span class="p">[</span><span class="nb">RBP</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">8</span><span class="p">],</span><span class="w"> </span><span class="mi">12</span> </pre></div> <p>This asks for the value <code>12</code> to be written into the memory address at <code>RBP - 8</code> bytes. The <code>QWORD PTR</code> part clarifies that we want to write 8 bytes worth of the value. Since <code>12</code> is less than 8 bytes, the rest will be filled with zeros.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nf">ADD</span><span class="w"> </span><span class="nb">RAX</span><span class="p">,</span><span class="w"> </span><span class="kt">QWORD</span><span class="w"> </span><span class="nv">PTR</span><span class="w"> </span><span class="p">[</span><span class="nb">RBP</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">8</span><span class="p">]</span> </pre></div> <p>This asks for eight bytes starting from the memory address <code>RBP - 8</code> to be added to the value in <code>RAX</code> and stored back in <code>RAX</code>.</p> <h3 id="virtual-machine-instruction-set">Virtual machine instruction set</h3><p>In our virtual machine we'll define support for the following instructions:</p> <ul> <li><code>MOV $REGISTER, $REGISTER or $MEMORY ADDRESS or $LITERAL NUMBER</code><ul> <li>This instruction copies the second value into the first.</li> </ul> </li> <li><code>ADD $REGISTER, $REGISTER or $MEMORY ADDRESS</code><ul> <li>This instruction adds the second value into the first and stores the result into the first.</li> </ul> </li> <li><code>PUSH $REGISTER</code><ul> <li>This instruction will decrement the <code>RSP</code> register by 8 bytes and store the value at the bottom of the stack.</li> </ul> </li> <li><code>POP $REGISTER</code><ul> <li>This instruction will increment the <code>RSP</code> register by 8 bytes, remove the last element in the stack (at the bottom), and store it into the register.</li> </ul> </li> <li><code>CALL $LABEL</code><ul> <li>This instruction will push the value in the <code>RIP</code> register (plus one) onto the stack and set the <code>RIP</code> register to the line of code of the label. More on this later.</li> </ul> </li> <li><code>RET</code><ul> <li>This instruction will remove the value at the bottom of the stack and store it in the <code>RIP</code> register.</li> </ul> </li> </ul> <p>Now we have more than enough instructions to write some interesting programs for the virtual machine.</p> <h3 id="virtual-machine-semantics">Virtual machine semantics</h3><p>We'll make one last assumption before explaining further. In our programs, there must be a <code>main</code> label which must contain a <code>RET</code> instruction. Once we hit the terminal <code>RET</code>, we will exit the virtual machine and set the exit code to the value stored in the <code>RAX</code> register.</p> <p>Let's look at a simple program:</p> <div class="highlight"><pre><span></span><span class="nl">main:</span><span class="w"> </span><span class="c1">; the required main label</span> <span class="w"> </span><span class="nf">MOV</span><span class="w"> </span><span class="nb">RAX</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="c1">; store 1 in RAX</span> <span class="w"> </span><span class="nf">MOV</span><span class="w"> </span><span class="nb">RDI</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="c1">; store 2 in RDI</span> <span class="w"> </span><span class="nf">ADD</span><span class="w"> </span><span class="nb">RAX</span><span class="p">,</span><span class="w"> </span><span class="nb">RDI</span><span class="w"> </span><span class="c1">; store the result of adding RAX and RDI in RAX</span> <span class="w"> </span><span class="nf">RET</span><span class="w"> </span><span class="c1">; give control back to the virtual machine</span> </pre></div> <p>When we run this program, first we initialize a stack (we'll give it 1000 elements) and set the <code>RSP</code> register to 1000 (the top of the stack). Then we look for the <code>main</code> label and set the <code>RIP</code> register to 1, the line number after the label appears (0). Then until the <code>RIP</code> register is 1000 again, we interpret the instruction at the line number stored in the <code>RIP</code> register. Once the <code>RIP</code> register hits 1000, we exit the program setting <code>RAX</code> as the exit code.</p> <h4 id="one-more-example">One more example</h4><p>Now let's look at one more program:</p> <div class="highlight"><pre><span></span><span class="nl">plus:</span> <span class="w"> </span><span class="nf">ADD</span><span class="w"> </span><span class="nb">RDI</span><span class="p">,</span><span class="w"> </span><span class="nb">RSI</span> <span class="w"> </span><span class="nf">MOV</span><span class="w"> </span><span class="nb">RAX</span><span class="p">,</span><span class="w"> </span><span class="nb">RDI</span> <span class="w"> </span><span class="nf">RET</span> <span class="nl">main:</span> <span class="w"> </span><span class="nf">MOV</span><span class="w"> </span><span class="nb">RDI</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span> <span class="w"> </span><span class="nf">MOV</span><span class="w"> </span><span class="nb">RSI</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span> <span class="w"> </span><span class="nf">CALL</span><span class="w"> </span><span class="nv">plus</span> <span class="w"> </span><span class="nf">RET</span> </pre></div> <p>Our virtual machine will start at the line after the <code>main</code> label. Then it will store <code>1</code> into <code>RDI</code> and <code>2</code> into <code>RSI</code>. Then it will jump to the second line in the program to add <code>RDI</code> and <code>RSI</code> and store the result in <code>RDI</code>. Then it will copy <code>RDI</code> into <code>RAX</code> and return control to the final line. This last <code>RET</code> will in turn return control to the virtual machine. Then the program will exit with exit code <code>3</code>.</p> <h3 id="parsing">Parsing</h3><p>Now that we've finished up describing our virtual machine language and semantics, we need to parse the instructions into a format we can easily interpret.</p> <p>To do this we'll iterate over the program skip any lines that start with a dot. These are virtual machine directives that are important for us to ignore for now. We'll also remove any characters including and following a semi-colon or hash-tag, until the end of the line. These are comments.</p> <p>We'll store a dictionary of label names to line numbers (the line number of the label plus one) and without the colon.</p> <p>And we'll store the instructions as an array of objects composed of an operation and optional operands.</p> <h4 id="code">Code</h4><div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">parse</span><span class="p">(</span><span class="nx">program</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">labels</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{};</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">instructions</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[];</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">lines</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">program</span><span class="p">.</span><span class="nx">split</span><span class="p">(</span><span class="s1">&#39;\n&#39;</span><span class="p">);</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">let</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="nx">lines</span><span class="p">.</span><span class="nx">length</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">lines</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">trim</span><span class="p">();</span><span class="w"> </span><span class="c1">// Remove any trailing, leading whitespace</span> <span class="w"> </span><span class="c1">// TODO: handle each line</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">labels</span><span class="p">,</span><span class="w"> </span><span class="nx">instructions</span><span class="w"> </span><span class="p">};</span> <span class="p">}</span> </pre></div> <p>First let's handle the directives we want to ignore:</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">let</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="nx">lines</span><span class="p">.</span><span class="nx">length</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">lines</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">trim</span><span class="p">();</span><span class="w"> </span><span class="c1">// Remove any trailing, leading whitespace</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">line</span><span class="p">.</span><span class="nx">startsWith</span><span class="p">(</span><span class="s1">&#39;.&#39;</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">continue</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>And then comments:</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">let</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="nx">lines</span><span class="p">.</span><span class="nx">length</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">lines</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">trim</span><span class="p">();</span><span class="w"> </span><span class="c1">// Remove any trailing, leading whitespace</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">line</span><span class="p">.</span><span class="nx">startsWith</span><span class="p">(</span><span class="s1">&#39;.&#39;</span><span class="p">)</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">line</span><span class="p">.</span><span class="nx">startsWith</span><span class="p">(</span><span class="s1">&#39;;&#39;</span><span class="p">)</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">line</span><span class="p">.</span><span class="nx">startsWith</span><span class="p">(</span><span class="s1">&#39;#&#39;</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">continue</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">line</span><span class="p">.</span><span class="nx">includes</span><span class="p">(</span><span class="s1">&#39;;&#39;</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">line</span><span class="p">.</span><span class="nx">split</span><span class="p">(</span><span class="s1">&#39;;&#39;</span><span class="p">)[</span><span class="mf">0</span><span class="p">];</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">line</span><span class="p">.</span><span class="nx">includes</span><span class="p">(</span><span class="s1">&#39;#&#39;</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">line</span><span class="p">.</span><span class="nx">split</span><span class="p">(</span><span class="s1">&#39;#&#39;</span><span class="p">)[</span><span class="mf">0</span><span class="p">];</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="nx">line</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">continue</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>And then labels:</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">let</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="nx">lines</span><span class="p">.</span><span class="nx">length</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">lines</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">trim</span><span class="p">();</span><span class="w"> </span><span class="c1">// Remove any trailing, leading whitespace</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">line</span><span class="p">.</span><span class="nx">startsWith</span><span class="p">(</span><span class="s1">&#39;.&#39;</span><span class="p">)</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">line</span><span class="p">.</span><span class="nx">startsWith</span><span class="p">(</span><span class="s1">&#39;;&#39;</span><span class="p">)</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">line</span><span class="p">.</span><span class="nx">startsWith</span><span class="p">(</span><span class="s1">&#39;#&#39;</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">continue</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">line</span><span class="p">.</span><span class="nx">includes</span><span class="p">(</span><span class="s1">&#39;;&#39;</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">line</span><span class="p">.</span><span class="nx">split</span><span class="p">(</span><span class="s1">&#39;;&#39;</span><span class="p">)[</span><span class="mf">0</span><span class="p">];</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">line</span><span class="p">.</span><span class="nx">includes</span><span class="p">(</span><span class="s1">&#39;#&#39;</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">line</span><span class="p">.</span><span class="nx">split</span><span class="p">(</span><span class="s1">&#39;#&#39;</span><span class="p">)[</span><span class="mf">0</span><span class="p">];</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="nx">line</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">continue</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">line</span><span class="p">.</span><span class="nx">includes</span><span class="p">(</span><span class="s1">&#39;:&#39;</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">line</span><span class="p">.</span><span class="nx">split</span><span class="p">(</span><span class="s1">&#39;:&#39;</span><span class="p">)[</span><span class="mf">0</span><span class="p">];</span> <span class="w"> </span><span class="nx">labels</span><span class="p">[</span><span class="nx">label</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">instructions</span><span class="p">.</span><span class="nx">length</span><span class="p">;</span> <span class="w"> </span><span class="k">continue</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>And finally instruction parsing plus the rest:</p> <div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">parse</span><span class="p">(</span><span class="nx">program</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">labels</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{};</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">instructions</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[];</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">lines</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">program</span><span class="p">.</span><span class="nx">split</span><span class="p">(</span><span class="s1">&#39;\n&#39;</span><span class="p">);</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">let</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="nx">lines</span><span class="p">.</span><span class="nx">length</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">lines</span><span class="p">[</span><span class="nx">i</span><span class="p">].</span><span class="nx">trim</span><span class="p">();</span><span class="w"> </span><span class="c1">// Remove any trailing, leading whitespace</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">line</span><span class="p">.</span><span class="nx">startsWith</span><span class="p">(</span><span class="s1">&#39;.&#39;</span><span class="p">)</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">line</span><span class="p">.</span><span class="nx">startsWith</span><span class="p">(</span><span class="s1">&#39;;&#39;</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">continue</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">line</span><span class="p">.</span><span class="nx">includes</span><span class="p">(</span><span class="s1">&#39;;&#39;</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">line</span><span class="p">.</span><span class="nx">split</span><span class="p">(</span><span class="s1">&#39;;&#39;</span><span class="p">)[</span><span class="mf">0</span><span class="p">];</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">line</span><span class="p">.</span><span class="nx">includes</span><span class="p">(</span><span class="s1">&#39;:&#39;</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">line</span><span class="p">.</span><span class="nx">split</span><span class="p">(</span><span class="s1">&#39;:&#39;</span><span class="p">)[</span><span class="mf">0</span><span class="p">];</span> <span class="w"> </span><span class="nx">labels</span><span class="p">[</span><span class="nx">label</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">instructions</span><span class="p">.</span><span class="nx">length</span><span class="p">;</span> <span class="w"> </span><span class="k">continue</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">operation</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">line</span><span class="p">.</span><span class="nx">split</span><span class="p">(</span><span class="sr">/\s/</span><span class="p">)[</span><span class="mf">0</span><span class="p">].</span><span class="nx">toLowerCase</span><span class="p">();</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">operands</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">line</span><span class="p">.</span><span class="nx">substring</span><span class="p">(</span><span class="nx">operation</span><span class="p">.</span><span class="nx">length</span><span class="p">).</span><span class="nx">split</span><span class="p">(</span><span class="s1">&#39;,&#39;</span><span class="p">).</span><span class="nx">map</span><span class="p">(</span><span class="nx">t</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="nx">t</span><span class="p">.</span><span class="nx">trim</span><span class="p">());</span> <span class="w"> </span><span class="nx">instructions</span><span class="p">.</span><span class="nx">push</span><span class="p">({</span> <span class="w"> </span><span class="nx">operation</span><span class="p">,</span> <span class="w"> </span><span class="nx">operands</span><span class="p">,</span> <span class="w"> </span><span class="p">});</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">labels</span><span class="p">,</span><span class="w"> </span><span class="nx">instructions</span><span class="w"> </span><span class="p">};</span> <span class="p">}</span> </pre></div> <p>Hurray! A brittle parser.</p> <h3 id="interpreting">Interpreting</h3><p>We've already described the semantics a few times. So let's get started with the foundation and initialization.</p> <p>We'll use <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/BigInt">BigInt</a>s because JavaScript integers are 53-bits wide. This isn't incredibly important in our basic programs but it will quickly became painful without.</p> <p>And we'll make process memory available as an array of 8-bit integers. In order to make this easy to use, we'll also provide helper function for writing to and reading from memory.</p> <div class="highlight"><pre><span></span><span class="kd">const</span><span class="w"> </span><span class="nx">fs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">&#39;fs&#39;</span><span class="p">);</span> <span class="kd">const</span><span class="w"> </span><span class="nx">REGISTERS</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span> <span class="w"> </span><span class="s1">&#39;RDI&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;RSI&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;RSP&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;RBP&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;RAX&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;RBX&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;RCX&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;RDX&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;RIP&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;R8&#39;</span><span class="p">,</span> <span class="w"> </span><span class="s1">&#39;R9&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;R10&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;R11&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;R12&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;R13&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;R14&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;R15&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;CS&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;DS&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;FS&#39;</span><span class="p">,</span> <span class="w"> </span><span class="s1">&#39;SS&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;ES&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;GS&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;CF&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;ZF&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;PF&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;AF&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;SF&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;TF&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;IF&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;DF&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;OF&#39;</span><span class="p">,</span> <span class="p">];</span> <span class="kd">function</span><span class="w"> </span><span class="nx">writeMemoryBytes</span><span class="p">(</span><span class="nx">process</span><span class="p">,</span><span class="w"> </span><span class="nx">address</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="nx">size</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">let</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0n</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="nx">size</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="o">&gt;&gt;=</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">8n</span><span class="p">;</span> <span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">memory</span><span class="p">[</span><span class="nx">address</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">i</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="o">&amp;</span><span class="w"> </span><span class="mh">0xFFn</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> <span class="kd">function</span><span class="w"> </span><span class="nx">readMemoryBytes</span><span class="p">(</span><span class="nx">process</span><span class="p">,</span><span class="w"> </span><span class="nx">address</span><span class="p">,</span><span class="w"> </span><span class="nx">size</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0n</span><span class="p">;</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">let</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0n</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="nx">size</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="o">|=</span><span class="w"> </span><span class="p">(</span><span class="nx">process</span><span class="p">.</span><span class="nx">memory</span><span class="p">[</span><span class="nx">address</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">i</span><span class="p">]</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="mi">0n</span><span class="p">)</span><span class="w"> </span><span class="o">&lt;&lt;</span><span class="w"> </span><span class="p">(</span><span class="nx">i</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="mi">8n</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">value</span><span class="p">;</span> <span class="p">}</span> <span class="kd">function</span><span class="w"> </span><span class="nx">interpret</span><span class="p">(</span><span class="nx">process</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// TODO: interpret</span> <span class="p">}</span> <span class="kd">function</span><span class="w"> </span><span class="nx">main</span><span class="p">(</span><span class="nx">file</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">memory</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="nb">Array</span><span class="p">(</span><span class="mf">10000</span><span class="p">);</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">code</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">fs</span><span class="p">.</span><span class="nx">readFileSync</span><span class="p">(</span><span class="nx">file</span><span class="p">).</span><span class="nx">toString</span><span class="p">();</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">instructions</span><span class="p">,</span><span class="w"> </span><span class="nx">labels</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">parse</span><span class="p">(</span><span class="nx">code</span><span class="p">);</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">registers</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">REGISTERS</span><span class="p">.</span><span class="nx">reduce</span><span class="p">((</span><span class="nx">rs</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="p">({</span><span class="w"> </span><span class="p">...</span><span class="nx">rs</span><span class="p">,</span><span class="w"> </span><span class="p">[</span><span class="nx">r</span><span class="p">]</span><span class="o">:</span><span class="w"> </span><span class="mi">0n</span><span class="w"> </span><span class="p">}),</span><span class="w"> </span><span class="p">{});</span> <span class="w"> </span><span class="nx">registers</span><span class="p">.</span><span class="nx">RIP</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">BigInt</span><span class="p">(</span><span class="nx">labels</span><span class="p">.</span><span class="nx">main</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="kc">undefined</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="nx">labels</span><span class="p">.</span><span class="nx">_main</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="nx">labels</span><span class="p">.</span><span class="nx">main</span><span class="p">);</span> <span class="w"> </span><span class="nx">registers</span><span class="p">.</span><span class="nx">RSP</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">BigInt</span><span class="p">(</span><span class="nx">memory</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">8</span><span class="p">);</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">process</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">registers</span><span class="p">,</span> <span class="w"> </span><span class="nx">memory</span><span class="p">,</span> <span class="w"> </span><span class="nx">instructions</span><span class="p">,</span> <span class="w"> </span><span class="nx">labels</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="nx">writeMemoryBytes</span><span class="p">(</span><span class="nx">process</span><span class="p">,</span><span class="w"> </span><span class="nx">registers</span><span class="p">.</span><span class="nx">RSP</span><span class="p">,</span><span class="w"> </span><span class="nx">registers</span><span class="p">.</span><span class="nx">RSP</span><span class="p">,</span><span class="w"> </span><span class="mf">8</span><span class="p">);</span> <span class="w"> </span><span class="nx">interpret</span><span class="p">(</span><span class="nx">process</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">Number</span><span class="p">(</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RAX</span><span class="p">);</span> <span class="p">}</span> <span class="nx">process</span><span class="p">.</span><span class="nx">exit</span><span class="p">(</span><span class="nx">main</span><span class="p">(</span><span class="nx">process</span><span class="p">.</span><span class="nx">argv</span><span class="p">[</span><span class="mf">2</span><span class="p">]));</span> </pre></div> <p>We'll accept <code>_main</code> as an entry point as well as <code>main</code> to support our macOS users. If you know why our macOS users use <code>_main</code> I'd love to know.</p> <p>To interpret, we grab the instruction pointed to in <code>RIP</code> and switch on the operation.</p> <div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">interpret</span><span class="p">(</span><span class="nx">process</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">do</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">instruction</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">instructions</span><span class="p">[</span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RIP</span><span class="p">];</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="nx">instruction</span><span class="p">.</span><span class="nx">operation</span><span class="p">.</span><span class="nx">toLowerCase</span><span class="p">())</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">&#39;mov&#39;</span><span class="o">:</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">&#39;add&#39;</span><span class="o">:</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">&#39;call&#39;</span><span class="o">:</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">&#39;ret&#39;</span><span class="o">:</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">&#39;push&#39;</span><span class="o">:</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">&#39;pop&#39;</span><span class="o">:</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RIP</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="nb">BigInt</span><span class="p">(</span><span class="nx">readMemoryBytes</span><span class="p">(</span><span class="nx">process</span><span class="p">,</span><span class="w"> </span><span class="nx">memory</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">8</span><span class="p">,</span><span class="w"> </span><span class="mf">8</span><span class="p">)));</span> <span class="p">}</span> </pre></div> <h4 id="interpreting-mov">Interpreting MOV</h4><p>Example:</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nf">MOV</span><span class="w"> </span><span class="nb">RAX</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span> <span class="w"> </span><span class="nf">MOV</span><span class="w"> </span><span class="nb">RAX</span><span class="p">,</span><span class="w"> </span><span class="nb">RDI</span> <span class="w"> </span><span class="nf">MOV</span><span class="w"> </span><span class="kt">QWORD</span><span class="w"> </span><span class="nv">PTR</span><span class="w"> </span><span class="p">[</span><span class="nb">RBP</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">8</span><span class="p">],</span><span class="w"> </span><span class="mi">8</span> </pre></div> <p>This instruction will store a value into a register or address and increment <code>RIP</code>. If the left-hand side is a memory address we will write to memory.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">&#39;mov&#39;</span><span class="o">:</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">lhs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">interpretValue</span><span class="p">(</span><span class="nx">process</span><span class="p">,</span><span class="w"> </span><span class="nx">instruction</span><span class="p">.</span><span class="nx">operands</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">lhs</span><span class="o">:</span><span class="w"> </span><span class="kc">true</span><span class="w"> </span><span class="p">});</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">rhs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">interpretValue</span><span class="p">(</span><span class="nx">process</span><span class="p">,</span><span class="w"> </span><span class="nx">instruction</span><span class="p">.</span><span class="nx">operands</span><span class="p">[</span><span class="mf">1</span><span class="p">]);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">REGISTERS</span><span class="p">.</span><span class="nx">includes</span><span class="p">(</span><span class="nx">lhs</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">[</span><span class="nx">lhs</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">rhs</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">writeMemoryBytes</span><span class="p">(</span><span class="nx">process</span><span class="p">,</span><span class="w"> </span><span class="nx">lhs</span><span class="p">.</span><span class="nx">address</span><span class="p">,</span><span class="w"> </span><span class="nx">rhs</span><span class="p">,</span><span class="w"> </span><span class="nx">lhs</span><span class="p">.</span><span class="nx">size</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RIP</span><span class="o">++</span><span class="p">;</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> </pre></div> <p>We're delegating to a helper function to handle registers vs. memory addresses vs. literals appropriately. Without memory addresses it's a simple function:</p> <div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">interpretValue</span><span class="p">(</span><span class="nx">process</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">lhs</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">lhs</span><span class="o">:</span><span class="w"> </span><span class="kc">false</span><span class="w"> </span><span class="p">})</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">REGISTERS</span><span class="p">.</span><span class="nx">includes</span><span class="p">(</span><span class="nx">value</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">lhs</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">value</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">[</span><span class="nx">value</span><span class="p">];</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">BigInt</span><span class="p">.</span><span class="nx">asIntN</span><span class="p">(</span><span class="mf">64</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">);</span> <span class="p">}</span> </pre></div> <p>We need to do some hacking to support memory addresses:</p> <div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">interpretValue</span><span class="p">(</span><span class="nx">process</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">lhs</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">lhs</span><span class="o">:</span><span class="w"> </span><span class="kc">false</span><span class="w"> </span><span class="p">})</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">REGISTERS</span><span class="p">.</span><span class="nx">includes</span><span class="p">(</span><span class="nx">value</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">lhs</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">value</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">[</span><span class="nx">value</span><span class="p">];</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">value</span><span class="p">.</span><span class="nx">startsWith</span><span class="p">(</span><span class="s1">&#39;QWORD PTR [&#39;</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">offsetString</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">value</span><span class="p">.</span><span class="nx">substring</span><span class="p">(</span><span class="s1">&#39;QWORD PTR [&#39;</span><span class="p">.</span><span class="nx">length</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="p">).</span><span class="nx">trim</span><span class="p">();</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">offsetString</span><span class="p">.</span><span class="nx">includes</span><span class="p">(</span><span class="s1">&#39;-&#39;</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="p">[</span><span class="nx">l</span><span class="p">,</span><span class="w"> </span><span class="nx">r</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">offsetString</span><span class="p">.</span><span class="nx">split</span><span class="p">(</span><span class="s1">&#39;-&#39;</span><span class="p">).</span><span class="nx">map</span><span class="p">(</span><span class="nx">l</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="nx">interpretValue</span><span class="p">(</span><span class="nx">process</span><span class="p">,</span><span class="w"> </span><span class="nx">l</span><span class="p">.</span><span class="nx">trim</span><span class="p">()));</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">address</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">l</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nx">r</span><span class="p">;</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">bytes</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">8</span><span class="p">;</span><span class="w"> </span><span class="c1">// qword is 8 bytes</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">lhs</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">address</span><span class="p">,</span><span class="w"> </span><span class="nx">size</span><span class="o">:</span><span class="w"> </span><span class="nx">bytes</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">readMemoryBytes</span><span class="p">(</span><span class="nx">process</span><span class="p">,</span><span class="w"> </span><span class="nx">address</span><span class="p">,</span><span class="w"> </span><span class="nx">bytes</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">&#39;Unsupported offset calculation: &#39;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">value</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">BigInt</span><span class="p">.</span><span class="nx">asIntN</span><span class="p">(</span><span class="mf">64</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">);</span> <span class="p">}</span> </pre></div> <h4 id="interpreting-add">Interpreting ADD</h4><p>Example:</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nf">ADD</span><span class="w"> </span><span class="nb">RAX</span><span class="p">,</span><span class="w"> </span><span class="nb">RDI</span> </pre></div> <p>This instruction will combine both registers and store the result in the first, then increment the <code>RIP</code> register.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">&#39;add&#39;</span><span class="o">:</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">lhs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">interpretValue</span><span class="p">(</span><span class="nx">process</span><span class="p">,</span><span class="w"> </span><span class="nx">instruction</span><span class="p">.</span><span class="nx">operands</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">lhs</span><span class="o">:</span><span class="w"> </span><span class="kc">true</span><span class="w"> </span><span class="p">});</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">rhs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">interpretValue</span><span class="p">(</span><span class="nx">process</span><span class="p">,</span><span class="w"> </span><span class="nx">instruction</span><span class="p">.</span><span class="nx">operands</span><span class="p">[</span><span class="mf">1</span><span class="p">]);</span> <span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">[</span><span class="nx">lhs</span><span class="p">]</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="nx">rhs</span><span class="p">;</span> <span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RIP</span><span class="o">++</span><span class="p">;</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> </pre></div> <h4 id="interpreting-call">Interpreting CALL</h4><p>Example:</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nf">CALL</span><span class="w"> </span><span class="nv">plus</span> </pre></div> <p>This instruction store <code>RIP</code> (plus one, to continue after the call instruction) on the stack and sets <code>RIP</code> to the location specified by the label.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">&#39;call&#39;</span><span class="o">:</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RSP</span><span class="w"> </span><span class="o">-=</span><span class="w"> </span><span class="mi">8n</span><span class="p">;</span> <span class="w"> </span><span class="nx">writeMemoryBytes</span><span class="p">(</span><span class="nx">process</span><span class="p">,</span><span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RSP</span><span class="p">,</span><span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RIP</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">1n</span><span class="p">,</span><span class="w"> </span><span class="mf">8</span><span class="p">);</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">instruction</span><span class="p">.</span><span class="nx">operands</span><span class="p">[</span><span class="mf">0</span><span class="p">];</span> <span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RIP</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">labels</span><span class="p">[</span><span class="nx">label</span><span class="p">];</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> </pre></div> <h4 id="interpreting-ret">Interpreting RET</h4><p>Example:</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nf">RET</span> </pre></div> <p>This instruction removes the last element from the stack and stores it in the <code>RIP</code> register.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">&#39;ret&#39;</span><span class="o">:</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">readMemoryBytes</span><span class="p">(</span><span class="nx">process</span><span class="p">,</span><span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RSP</span><span class="p">,</span><span class="w"> </span><span class="mf">8</span><span class="p">);</span> <span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RSP</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">8n</span><span class="p">;</span> <span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RIP</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">value</span><span class="p">;</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> </pre></div> <h4 id="interpreting-push">Interpreting PUSH</h4><p>Example:</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nf">PUSH</span><span class="w"> </span><span class="nb">RAX</span> </pre></div> <p>This instruction stores the value in the register on the stack and increments <code>RIP</code>.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">&#39;push&#39;</span><span class="o">:</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">interpretValue</span><span class="p">(</span><span class="nx">process</span><span class="p">,</span><span class="w"> </span><span class="nx">instruction</span><span class="p">.</span><span class="nx">operands</span><span class="p">[</span><span class="mf">0</span><span class="p">]);</span> <span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RSP</span><span class="w"> </span><span class="o">-=</span><span class="w"> </span><span class="mi">8n</span><span class="p">;</span> <span class="w"> </span><span class="nx">writeMemoryBytes</span><span class="p">(</span><span class="nx">process</span><span class="p">,</span><span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RSP</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">,</span><span class="w"> </span><span class="mf">8</span><span class="p">);</span> <span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RIP</span><span class="o">++</span><span class="p">;</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> </pre></div> <h4 id="interpreting-pop">Interpreting POP</h4><p>Example:</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nf">POP</span><span class="w"> </span><span class="nb">RAX</span> </pre></div> <p>This instruction removes the last element from the stack and stores it into the register specified. Then it increments <code>RIP</code>.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">&#39;pop&#39;</span><span class="o">:</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">lhs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">interpretValue</span><span class="p">(</span><span class="nx">process</span><span class="p">,</span><span class="w"> </span><span class="nx">instruction</span><span class="p">.</span><span class="nx">operands</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">lhs</span><span class="o">:</span><span class="w"> </span><span class="kc">true</span><span class="w"> </span><span class="p">});</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">value</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">readMemoryBytes</span><span class="p">(</span><span class="nx">process</span><span class="p">,</span><span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RSP</span><span class="p">,</span><span class="w"> </span><span class="mf">8</span><span class="p">);</span> <span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RSP</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="mi">8n</span><span class="p">;</span> <span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">[</span><span class="nx">lhs</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">value</span><span class="p">;</span> <span class="w"> </span><span class="nx">process</span><span class="p">.</span><span class="nx">registers</span><span class="p">.</span><span class="nx">RIP</span><span class="o">++</span><span class="p">;</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> </pre></div> <h3 id="all-together">All together</h3><div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>test1.asm main:<span class="w"> </span><span class="p">;</span><span class="w"> </span>the<span class="w"> </span>required<span class="w"> </span>main<span class="w"> </span>label <span class="w"> </span>MOV<span class="w"> </span>RAX,<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="p">;</span><span class="w"> </span>store<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="k">in</span><span class="w"> </span>RAX <span class="w"> </span>MOV<span class="w"> </span>RDI,<span class="w"> </span><span class="m">2</span><span class="w"> </span><span class="p">;</span><span class="w"> </span>store<span class="w"> </span><span class="m">2</span><span class="w"> </span><span class="k">in</span><span class="w"> </span>RDI <span class="w"> </span>ADD<span class="w"> </span>RAX,<span class="w"> </span>RDI<span class="w"> </span><span class="p">;</span><span class="w"> </span>store<span class="w"> </span>the<span class="w"> </span>result<span class="w"> </span>of<span class="w"> </span>adding<span class="w"> </span>RAX<span class="w"> </span>and<span class="w"> </span>RDI<span class="w"> </span><span class="k">in</span><span class="w"> </span>RAX <span class="w"> </span>RET<span class="w"> </span><span class="p">;</span><span class="w"> </span>give<span class="w"> </span>control<span class="w"> </span>back<span class="w"> </span>to<span class="w"> </span>the<span class="w"> </span>virtual<span class="w"> </span>machine $<span class="w"> </span>node<span class="w"> </span>emulator.js<span class="w"> </span>test1.asm $<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="nv">$?</span> <span class="m">3</span> </pre></div> <p>And finally, let's see what we can do with a simple C program:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>plus.c long<span class="w"> </span>main<span class="o">()</span><span class="w"> </span><span class="o">{</span> <span class="w"> </span>long<span class="w"> </span><span class="nv">a</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">5</span><span class="p">;</span> <span class="w"> </span>long<span class="w"> </span><span class="nv">b</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">6</span><span class="p">;</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span>a<span class="w"> </span>+<span class="w"> </span>b<span class="p">;</span> <span class="o">}</span> $<span class="w"> </span>gcc<span class="w"> </span>-S<span class="w"> </span>-masm<span class="o">=</span>intel<span class="w"> </span>-o<span class="w"> </span>plus.s<span class="w"> </span>plus.c $<span class="w"> </span>node<span class="w"> </span>emulator.js<span class="w"> </span>plus.s $<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="nv">$?</span> <span class="m">11</span> </pre></div> <p>And we've got the start of a working x86_64/AMD64 emulator.</p> <h3 id="next-steps">Next steps</h3><p>We aren't setting flags appropriately to support conditionals, so that's low-hanging fruit. Additionally, syscalls open up a new world (that we'll end up needing since exit codes are limited to 8-bits of information). Additionally, our parsing is brittle. Dealing with ELF files may be a better direction to go and also enables more. We'll explore these aspects and others in follow-up posts.</p> <h3 id="human-interest">Human interest</h3><p>I originally intended to build a GameBoy emulator because the hardware is simple and uniform. But I found it easiest to start hacking together an AMD64 emulator because AMD64 is well-documented and gcc is easy enough to use. I'm still interested though unless/until I figure out how to emulate a graphics card for AMD64.</p> <p>It's tricky! But not that tricky. I built a <a href="https://github.com/eatonphil/x86e">graphical debugger</a> around this emulator to help out with the logic and off-by-one errors. But ultimately it's been surprising to me how easy it is to get started -- especially when I'm not concerned about absolute correctness (yet).</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Here&#39;s my first post on a series on emulator basics. It&#39;s baby&#39;s first stack and register virtual machine and it turns out it runs x86 code. <a href="https://t.co/WiWmGedawt">https://t.co/WiWmGedawt</a> <a href="https://twitter.com/hashtag/linux?src=hash&amp;ref_src=twsrc%5Etfw">#linux</a> <a href="https://twitter.com/hashtag/assembly?src=hash&amp;ref_src=twsrc%5Etfw">#assembly</a> <a href="https://t.co/xjiMkhgpdN">https://t.co/xjiMkhgpdN</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1132036835964870657?ref_src=twsrc%5Etfw">May 24, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/emulator-basics-a-stack-and-register-machine.htmlTue, 21 May 2019 00:00:00 +0000Tail call eliminationhttp://notes.eatonphil.com/tail-call-elimination.html<p>In this post we'll explore what tail calls are, why they are useful, and how they can be eliminated in an interpreter, a compiler targeting C++, and a compiler targeting LLVM IR.</p> <h3 id="tail-calls">Tail calls</h3><p>A tail call is a function call made at the end of a block that returns the value of the call (some languages do not force this <code>return</code> requirement). Here are a few examples.</p> <div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">tailCallEx1</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Loops forever but is a tail call.</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">tailCallEx1</span><span class="p">();</span> <span class="p">}</span> <span class="kd">function</span><span class="w"> </span><span class="nx">tailCallEx2</span><span class="p">(</span><span class="nx">x</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">x</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">tailCallEx2</span><span class="p">(</span><span class="nx">x</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mf">1</span><span class="p">;</span> <span class="p">}</span> <span class="kd">function</span><span class="w"> </span><span class="nx">tailCallEx3</span><span class="p">(</span><span class="nx">x</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">x</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">tailCallEx</span><span class="p">(</span><span class="nx">x</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="p">)</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="mf">1</span><span class="p">;</span> <span class="p">}</span> <span class="kd">function</span><span class="w"> </span><span class="nx">tailCallEx4</span><span class="p">(</span><span class="nx">x</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="nx">x</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="mf">0</span><span class="o">:</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mf">1</span><span class="p">;</span> <span class="w"> </span><span class="k">default</span><span class="o">:</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">tailCallEx4</span><span class="p">(</span><span class="nx">x</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>And here are some examples of non-tail calls.</p> <div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">nonTailCallEx1</span><span class="p">(</span><span class="nx">x</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">x</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Not a tail call because the call is not the value returned.</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mf">1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">nonTailCallEx1</span><span class="p">(</span><span class="nx">x</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mf">0</span><span class="p">;</span> <span class="p">}</span> <span class="kd">function</span><span class="w"> </span><span class="nx">nonTailCallEx2</span><span class="p">(</span><span class="nx">x</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">x</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">r</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">nonTailCallEx2</span><span class="p">(</span><span class="nx">x</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Not a tail call because the value is not *immediately* returned.</span> <span class="w"> </span><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">r</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">r</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="mf">1</span><span class="p">;</span> <span class="p">}</span> </pre></div> <h3 id="why-is-this-important?">Why is this important?</h3><p>Some languages can rewrite a recursive tail call as a jump/branch/goto instead of a function call. This allows:</p> <ol> <li>Potential performance gain if function calls have large overhead</li> <li>No stack overflows due to no nested function call stacks</li> </ol> <h3 id="implementation-1:-interpreter">Implementation 1: Interpreter</h3><p>Given a tail call recursive fibonacci:</p> <div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">fibonacci</span><span class="p">(</span><span class="nx">a</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="p">,</span><span class="w"> </span><span class="nx">n</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">n</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="mf">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">a</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">n</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="mf">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">b</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fibonacci</span><span class="p">(</span><span class="nx">b</span><span class="p">,</span><span class="w"> </span><span class="nx">a</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">b</span><span class="p">,</span><span class="w"> </span><span class="nx">n</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="p">);</span> <span class="p">}</span> </pre></div> <p>Here is how we could transform (by hand) this without a tail call.</p> <div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">fibonacci</span><span class="p">(</span><span class="nx">a</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="p">,</span><span class="w"> </span><span class="nx">n</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="kc">true</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">n</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="mf">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">a</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">n</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="mf">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">b</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">a1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">b</span><span class="p">;</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">b1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">a</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">b</span><span class="p">;</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">n1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">n</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="p">;</span> <span class="w"> </span><span class="nx">a</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">a1</span><span class="p">;</span> <span class="w"> </span><span class="nx">b</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">b1</span><span class="p">;</span> <span class="w"> </span><span class="nx">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">n1</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>If this was written in a language with labels and goto we could simplify the code slightly by doing that. But it is the same effect as a loop.</p> <p>Since we're in an interpreter (that isn't JIT compiling), we cannot pick between these two and must merge them. So we put all function bodies in a loop and break if it isn't a tail call. Otherwise we line up the paremeters and let the loop take us back.</p> <p>Here is an example of this strategy used in a <a href="https://github.com/eatonphil/bsdscheme">Scheme interpreter</a> written in D.</p> <div class="highlight"><pre><span></span><span class="c1">// Define a new function with name `name` and add it to the context.</span> <span class="n">Value</span><span class="w"> </span><span class="n">namedLambda</span><span class="p">(</span><span class="n">Value</span><span class="w"> </span><span class="n">arguments</span><span class="p">,</span><span class="w"> </span><span class="n">Context</span><span class="w"> </span><span class="n">ctx</span><span class="p">,</span><span class="w"> </span><span class="nb">string</span><span class="w"> </span><span class="n">name</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">funArguments</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">car</span><span class="p">(</span><span class="n">arguments</span><span class="p">);</span> <span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">funBody</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">cdr</span><span class="p">(</span><span class="n">arguments</span><span class="p">);</span> <span class="w"> </span><span class="n">Value</span><span class="w"> </span><span class="n">defined</span><span class="p">(</span><span class="n">Value</span><span class="w"> </span><span class="n">parameters</span><span class="p">,</span><span class="w"> </span><span class="kt">void</span><span class="p">**</span><span class="w"> </span><span class="n">rest</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">Context</span><span class="w"> </span><span class="n">newCtx</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">ctx</span><span class="p">.</span><span class="n">dup</span><span class="p">;</span> <span class="w"> </span><span class="c1">// Copy the runtime calling context to the new context.</span> <span class="w"> </span><span class="n">Context</span><span class="w"> </span><span class="n">runtimeCtx</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">cast</span><span class="p">(</span><span class="n">Context</span><span class="p">)(*</span><span class="n">rest</span><span class="p">);</span> <span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">runtimeCallingContext</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">runtimeCtx</span><span class="p">.</span><span class="n">callingContext</span><span class="p">;</span> <span class="w"> </span><span class="n">newCtx</span><span class="p">.</span><span class="n">callingContext</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">runtimeCallingContext</span><span class="p">.</span><span class="n">dup</span><span class="p">;</span> <span class="w"> </span><span class="n">Value</span><span class="w"> </span><span class="n">result</span><span class="p">;</span> <span class="w"> </span><span class="c1">// Loop forever, will break immediately if not a tail call</span> <span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">tailCalling</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">false</span><span class="p">;</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="kc">true</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">valueIsList</span><span class="p">(</span><span class="n">funArguments</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">keyTmp</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">valueToList</span><span class="p">(</span><span class="n">funArguments</span><span class="p">);</span> <span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">valueTmp</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">valueToList</span><span class="p">(</span><span class="n">parameters</span><span class="p">);</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="kc">true</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">key</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">valueToSymbol</span><span class="p">(</span><span class="n">keyTmp</span><span class="p">[</span><span class="mi">0</span><span class="p">]);</span> <span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">value</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">valueTmp</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span> <span class="w"> </span><span class="n">newCtx</span><span class="p">.</span><span class="n">set</span><span class="p">(</span><span class="n">key</span><span class="p">,</span><span class="w"> </span><span class="n">value</span><span class="p">);</span> <span class="w"> </span><span class="c1">// TODO: handle arg count mismatch</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">valueIsList</span><span class="p">(</span><span class="n">keyTmp</span><span class="p">[</span><span class="mi">1</span><span class="p">]))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">keyTmp</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">valueToList</span><span class="p">(</span><span class="n">keyTmp</span><span class="p">[</span><span class="mi">1</span><span class="p">]);</span> <span class="w"> </span><span class="n">valueTmp</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">valueToList</span><span class="p">(</span><span class="n">valueTmp</span><span class="p">[</span><span class="mi">1</span><span class="p">]);</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">valueIsSymbol</span><span class="p">(</span><span class="n">funArguments</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">key</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">valueToSymbol</span><span class="p">(</span><span class="n">funArguments</span><span class="p">);</span> <span class="w"> </span><span class="n">newCtx</span><span class="p">.</span><span class="n">set</span><span class="p">(</span><span class="n">key</span><span class="p">,</span><span class="w"> </span><span class="n">car</span><span class="p">(</span><span class="n">parameters</span><span class="p">));</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(!</span><span class="n">valueIsNil</span><span class="p">(</span><span class="n">funArguments</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">error</span><span class="p">(</span><span class="s">&quot;Expected symbol or list in lambda formals&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">funArguments</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(!</span><span class="n">tailCalling</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">newCtx</span><span class="p">.</span><span class="n">callingContext</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">Tuple</span><span class="p">!(</span><span class="nb">string</span><span class="p">,</span><span class="w"> </span><span class="n">Delegate</span><span class="p">)(</span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="p">&amp;</span><span class="n">defined</span><span class="p">));</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">result</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">eval</span><span class="p">(</span><span class="n">withBegin</span><span class="p">(</span><span class="n">funBody</span><span class="p">),</span><span class="w"> </span><span class="k">cast</span><span class="p">(</span><span class="kt">void</span><span class="p">**)[</span><span class="n">newCtx</span><span class="p">]);</span> <span class="w"> </span><span class="c1">// In a tail call, let the loop carry us back.</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">newCtx</span><span class="p">.</span><span class="n">doTailCall</span><span class="w"> </span><span class="p">==</span><span class="w"> </span><span class="p">&amp;</span><span class="n">defined</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">tailCalling</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">true</span><span class="p">;</span> <span class="w"> </span><span class="n">parameters</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="n">result</span><span class="p">;</span> <span class="w"> </span><span class="n">newCtx</span><span class="p">.</span><span class="n">doTailCall</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="kc">null</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span><span class="w"> </span><span class="c1">// Not in a tail call, we&#39;re done a regular function call.</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">result</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">makeFunctionValue</span><span class="p">(</span><span class="n">name</span><span class="p">,</span><span class="w"> </span><span class="p">&amp;</span><span class="n">defined</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span><span class="p">);</span> <span class="p">}</span> </pre></div> <p class="note"> We can not eliminate mutually recursive tail calls with this method. We could use continuation-passing style but that would not have addressed the concern: not making a function call. </p><h3 id="implementation-2:-compiling-to-c++">Implementation 2: Compiling to C++</h3><p>The strategy here is the same as in the interpreter except for that since tail call recursive functions are known at compile time, we can generate non-generalized code in function bodies.</p> <p>Here is how a <a href="https://github.com/eatonphil/jsc">JavaScript compiler</a> transforms the above fibonacci implementation into C++:</p> <div class="highlight"><pre><span></span><span class="nb nb-Type">void</span><span class="w"> </span><span class="n">tco_fib</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="n">FunctionCallbackInfo</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;</span><span class="w"> </span><span class="o">&amp;</span><span class="n">args</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">Isolate</span><span class="w"> </span><span class="o">*</span><span class="n">isolate</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">args</span><span class="o">.</span><span class="n">GetIsolate</span><span class="p">();</span> <span class="w"> </span><span class="n">double</span><span class="w"> </span><span class="n">tco_n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">toNumber</span><span class="p">(</span><span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">]);</span> <span class="w"> </span><span class="n">double</span><span class="w"> </span><span class="n">tco_a</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">toNumber</span><span class="p">(</span><span class="n">args</span><span class="p">[</span><span class="mi">1</span><span class="p">]);</span> <span class="w"> </span><span class="n">double</span><span class="w"> </span><span class="n">tco_b</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">toNumber</span><span class="p">(</span><span class="n">args</span><span class="p">[</span><span class="mi">2</span><span class="p">]);</span> <span class="n">tail_recurse_1</span><span class="p">:</span> <span class="w"> </span><span class="p">;</span> <span class="w"> </span><span class="nb nb-Type">bool</span><span class="w"> </span><span class="n">sym_if_test_58</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">tco_n</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">sym_if_test_58</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">args</span><span class="o">.</span><span class="n">GetReturnValue</span><span class="p">()</span><span class="o">.</span><span class="n">Set</span><span class="p">(</span><span class="n">Number</span><span class="p">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="n">tco_a</span><span class="p">));</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nb nb-Type">bool</span><span class="w"> </span><span class="n">sym_if_test_70</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">tco_n</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">sym_if_test_70</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">args</span><span class="o">.</span><span class="n">GetReturnValue</span><span class="p">()</span><span class="o">.</span><span class="n">Set</span><span class="p">(</span><span class="n">Number</span><span class="p">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="n">tco_b</span><span class="p">));</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;</span><span class="w"> </span><span class="n">sym_arg_83</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Number</span><span class="p">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="n">tco_n</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mi">1</span><span class="p">));</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;</span><span class="w"> </span><span class="n">sym_arg_92</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Number</span><span class="p">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="n">tco_a</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">tco_b</span><span class="p">));</span> <span class="w"> </span><span class="n">tco_n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">toNumber</span><span class="p">(</span><span class="n">sym_arg_83</span><span class="p">);</span> <span class="w"> </span><span class="n">tco_a</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tco_b</span><span class="p">;</span> <span class="w"> </span><span class="n">tco_b</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">toNumber</span><span class="p">(</span><span class="n">sym_arg_92</span><span class="p">);</span> <span class="w"> </span><span class="n">goto</span><span class="w"> </span><span class="n">tail_recurse_1</span><span class="p">;</span> <span class="p">}</span> </pre></div> <p>This is implemented by checking every function call. If the function call is in tail call position, we generate code for jumping to the beginning of the function. Otherwise, we generate a call as usual.</p> <p>Here is how the tail call check and code-generation is done in the <a href="https://github.com/eatonphil/jsc/blob/master/src/compile/compile.ts#L186">compiler</a>:</p> <div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">compileCall</span><span class="p">(</span> <span class="w"> </span><span class="nx">context</span><span class="o">:</span><span class="w"> </span><span class="kt">Context</span><span class="p">,</span> <span class="w"> </span><span class="nx">destination</span><span class="o">:</span><span class="w"> </span><span class="kt">Local</span><span class="p">,</span> <span class="w"> </span><span class="nx">ce</span><span class="o">:</span><span class="w"> </span><span class="kt">ts.CallExpression</span><span class="p">,</span> <span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">tcoLabel</span><span class="p">;</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">tcoParameters</span><span class="p">;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">ce</span><span class="p">.</span><span class="nx">expression</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="nx">ts</span><span class="p">.</span><span class="nx">SyntaxKind</span><span class="p">.</span><span class="nx">Identifier</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">identifier</span><span class="p">(</span><span class="nx">ce</span><span class="p">.</span><span class="nx">expression</span><span class="w"> </span><span class="kr">as</span><span class="w"> </span><span class="nx">ts</span><span class="p">.</span><span class="nx">Identifier</span><span class="p">);</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">safe</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">context</span><span class="p">.</span><span class="nx">locals</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="nx">mangle</span><span class="p">(</span><span class="nx">context</span><span class="p">.</span><span class="kr">module</span><span class="nx">Name</span><span class="p">,</span><span class="w"> </span><span class="nx">id</span><span class="p">));</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">safe</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">context</span><span class="p">.</span><span class="nx">tco</span><span class="p">[</span><span class="nx">safe</span><span class="p">.</span><span class="nx">getCode</span><span class="p">()])</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">safeName</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">safe</span><span class="p">.</span><span class="nx">getCode</span><span class="p">();</span> <span class="w"> </span><span class="nx">tcoLabel</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">context</span><span class="p">.</span><span class="nx">tco</span><span class="p">[</span><span class="nx">safeName</span><span class="p">].</span><span class="nx">label</span><span class="p">;</span> <span class="w"> </span><span class="nx">tcoParameters</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">context</span><span class="p">.</span><span class="nx">tco</span><span class="p">[</span><span class="nx">safeName</span><span class="p">].</span><span class="nx">parameters</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">...</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">ce</span><span class="p">.</span><span class="nx">expression</span><span class="p">.</span><span class="nx">kind</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="nx">ts</span><span class="p">.</span><span class="nx">SyntaxKind</span><span class="p">.</span><span class="nx">Identifier</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">identifier</span><span class="p">(</span><span class="nx">ce</span><span class="p">.</span><span class="nx">expression</span><span class="w"> </span><span class="kr">as</span><span class="w"> </span><span class="nx">ts</span><span class="p">.</span><span class="nx">Identifier</span><span class="p">);</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">mangled</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">mangle</span><span class="p">(</span><span class="nx">context</span><span class="p">.</span><span class="kr">module</span><span class="nx">Name</span><span class="p">,</span><span class="w"> </span><span class="nx">id</span><span class="p">);</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">safe</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">context</span><span class="p">.</span><span class="nx">locals</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="nx">mangled</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">safe</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">tcoLabel</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">args</span><span class="p">.</span><span class="nx">forEach</span><span class="p">((</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">compileParameter</span><span class="p">(</span> <span class="w"> </span><span class="nx">context</span><span class="p">,</span> <span class="w"> </span><span class="nx">tcoParameters</span><span class="p">[</span><span class="nx">i</span><span class="p">],</span> <span class="w"> </span><span class="nx">i</span><span class="p">,</span> <span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="nx">args</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="p">,</span> <span class="w"> </span><span class="nx">arg</span><span class="p">,</span> <span class="w"> </span><span class="p">);</span> <span class="w"> </span><span class="p">});</span> <span class="w"> </span><span class="nx">context</span><span class="p">.</span><span class="nx">emitStatement</span><span class="p">(</span><span class="sb">`goto </span><span class="si">${</span><span class="nx">tcoLabel</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span> <span class="w"> </span><span class="nx">context</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="s1">&#39;&#39;</span><span class="p">,</span><span class="w"> </span><span class="mf">0</span><span class="p">);</span> <span class="w"> </span><span class="nx">destination</span><span class="p">.</span><span class="nx">tce</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">true</span><span class="p">;</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">...</span> <span class="w"> </span><span class="c1">// Otherwise generate regular function call</span> </pre></div> <p>This requires you to have been building up the state throughout the AST to know whether or not any particular call is in tail position.</p> <h3 id="implementation-3:-compiling-to-llvm-ir">Implementation 3: Compiling to LLVM IR</h3><p>LLVM IR is the most boring because all you do is mark any tail call as being a tail call. Then so long as the call meets some <a href="https://llvm.org/docs/LangRef.html#id320">requirements</a>, the key one being that the result of the call must be returned immediately, LLVM will generate a jump instead of a call for you.</p> <p>Given the following lisp-y implementation of the same tail call recursive fibonacci function (compiler <a href="https://github.com/eatonphil/ulisp">here</a>):</p> <div class="highlight"><pre><span></span><span class="p">(</span><span class="nv">def</span><span class="w"> </span><span class="nv">fib</span><span class="w"> </span><span class="p">(</span><span class="nv">a</span><span class="w"> </span><span class="nv">b</span><span class="w"> </span><span class="nv">n</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nb">=</span><span class="w"> </span><span class="nv">n</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span> <span class="w"> </span><span class="nv">a</span> <span class="w"> </span><span class="p">(</span><span class="nv">fib</span><span class="w"> </span><span class="nv">b</span><span class="w"> </span><span class="p">(</span><span class="nb">+</span><span class="w"> </span><span class="nv">a</span><span class="w"> </span><span class="nv">b</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="nb">-</span><span class="w"> </span><span class="nv">n</span><span class="w"> </span><span class="mi">1</span><span class="p">))))</span> </pre></div> <p>We generate the following LLVM IR:</p> <div class="highlight"><pre><span></span><span class="k">define</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="vg">@fib</span><span class="p">(</span><span class="kt">i64</span><span class="w"> </span><span class="nv">%a</span><span class="p">,</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="nv">%b</span><span class="p">,</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="nv">%n</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nv">%ifresult13</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">alloca</span><span class="w"> </span><span class="kt">i64</span><span class="p">,</span><span class="w"> </span><span class="k">align</span><span class="w"> </span><span class="m">4</span> <span class="w"> </span><span class="nv">%sym14</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="nv">%n</span><span class="p">,</span><span class="w"> </span><span class="m">0</span> <span class="w"> </span><span class="nv">%sym15</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">0</span> <span class="w"> </span><span class="nv">%sym12</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">icmp</span><span class="w"> </span><span class="k">eq</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="nv">%sym14</span><span class="p">,</span><span class="w"> </span><span class="nv">%sym15</span> <span class="w"> </span><span class="k">br</span><span class="w"> </span><span class="kt">i1</span><span class="w"> </span><span class="nv">%sym12</span><span class="p">,</span><span class="w"> </span><span class="kt">label</span><span class="w"> </span><span class="nv">%iftrue16</span><span class="p">,</span><span class="w"> </span><span class="kt">label</span><span class="w"> </span><span class="nv">%iffalse17</span> <span class="nl">iftrue16:</span> <span class="w"> </span><span class="nv">%sym18</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="nv">%a</span><span class="p">,</span><span class="w"> </span><span class="m">0</span> <span class="w"> </span><span class="k">store</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="nv">%sym18</span><span class="p">,</span><span class="w"> </span><span class="kt">i64</span><span class="p">*</span><span class="w"> </span><span class="nv">%ifresult13</span><span class="p">,</span><span class="w"> </span><span class="k">align</span><span class="w"> </span><span class="m">4</span> <span class="w"> </span><span class="k">br</span><span class="w"> </span><span class="kt">label</span><span class="w"> </span><span class="nv">%ifend19</span> <span class="nl">iffalse17:</span> <span class="w"> </span><span class="nv">%sym21</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="nv">%b</span><span class="p">,</span><span class="w"> </span><span class="m">0</span> <span class="w"> </span><span class="nv">%sym23</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="nv">%a</span><span class="p">,</span><span class="w"> </span><span class="m">0</span> <span class="w"> </span><span class="nv">%sym24</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="nv">%b</span><span class="p">,</span><span class="w"> </span><span class="m">0</span> <span class="w"> </span><span class="nv">%sym22</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="nv">%sym23</span><span class="p">,</span><span class="w"> </span><span class="nv">%sym24</span> <span class="w"> </span><span class="nv">%sym26</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="nv">%n</span><span class="p">,</span><span class="w"> </span><span class="m">0</span> <span class="w"> </span><span class="nv">%sym27</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">0</span> <span class="w"> </span><span class="nv">%sym25</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">sub</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="nv">%sym26</span><span class="p">,</span><span class="w"> </span><span class="nv">%sym27</span> <span class="w"> </span><span class="c">; NOTE the `tail` before `call` here</span> <span class="w"> </span><span class="nv">%sym20</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">tail</span><span class="w"> </span><span class="k">call</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="vg">@fib</span><span class="p">(</span><span class="kt">i64</span><span class="w"> </span><span class="nv">%sym21</span><span class="p">,</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="nv">%sym22</span><span class="p">,</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="nv">%sym25</span><span class="p">)</span> <span class="w"> </span><span class="k">ret</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="nv">%sym20</span> <span class="w"> </span><span class="k">store</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="nv">%sym20</span><span class="p">,</span><span class="w"> </span><span class="kt">i64</span><span class="p">*</span><span class="w"> </span><span class="nv">%ifresult13</span><span class="p">,</span><span class="w"> </span><span class="k">align</span><span class="w"> </span><span class="m">4</span> <span class="w"> </span><span class="k">br</span><span class="w"> </span><span class="kt">label</span><span class="w"> </span><span class="nv">%ifend19</span> <span class="nl">ifend19:</span> <span class="w"> </span><span class="nv">%sym11</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">load</span><span class="w"> </span><span class="kt">i64</span><span class="p">,</span><span class="w"> </span><span class="kt">i64</span><span class="p">*</span><span class="w"> </span><span class="nv">%ifresult13</span><span class="p">,</span><span class="w"> </span><span class="k">align</span><span class="w"> </span><span class="m">4</span> <span class="w"> </span><span class="k">ret</span><span class="w"> </span><span class="kt">i64</span><span class="w"> </span><span class="nv">%sym11</span> <span class="p">}</span> </pre></div> <p>The only difference between supporting tail call elimination in is whether the <code>call</code> instruction is preceeded by a <code>tail</code> directive. That makes the <a href="https://github.com/eatonphil/ulisp/blob/master/src/backend/llvm.js#L198">implementation</a> very simple:</p> <div class="highlight"><pre><span></span><span class="kd">const</span><span class="w"> </span><span class="nx">isTailCall</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">module</span><span class="p">.</span><span class="nx">exports</span><span class="p">.</span><span class="nx">TAIL_CALL_ENABLED</span><span class="w"> </span><span class="o">&amp;&amp;</span> <span class="w"> </span><span class="nx">context</span><span class="p">.</span><span class="nx">tailCallTree</span><span class="p">.</span><span class="nx">includes</span><span class="p">(</span><span class="nx">validFunction</span><span class="p">.</span><span class="nx">value</span><span class="p">);</span> <span class="kd">const</span><span class="w"> </span><span class="nx">maybeTail</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">isTailCall</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="s1">&#39;tail &#39;</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="s1">&#39;&#39;</span><span class="p">;</span> <span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`%</span><span class="si">${</span><span class="nx">destination</span><span class="p">.</span><span class="nx">value</span><span class="si">}</span><span class="sb"> = </span><span class="si">${</span><span class="nx">maybeTail</span><span class="si">}</span><span class="sb">call </span><span class="si">${</span><span class="nx">validFunction</span><span class="p">.</span><span class="nx">type</span><span class="si">}</span><span class="sb"> @</span><span class="si">${</span><span class="nx">validFunction</span><span class="p">.</span><span class="nx">value</span><span class="si">}</span><span class="sb">(</span><span class="si">${</span><span class="nx">safeArgs</span><span class="si">}</span><span class="sb">)`</span><span class="p">);</span> <span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">isTailCall</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`ret </span><span class="si">${</span><span class="nx">destination</span><span class="p">.</span><span class="nx">type</span><span class="si">}</span><span class="sb"> %</span><span class="si">${</span><span class="nx">destination</span><span class="p">.</span><span class="nx">value</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span> <span class="p">}</span> </pre></div> <h4 id="generated-assembly">Generated assembly</h4><p>The resulting generated code (run through <a href="https://llvm.org/docs/CommandGuide/llc.html">llc</a>) for that call will be:</p> <div class="highlight"><pre><span></span><span class="na">...</span> <span class="w"> </span><span class="nf">add</span><span class="w"> </span><span class="no">rax</span><span class="p">,</span><span class="w"> </span><span class="no">rsi</span> <span class="w"> </span><span class="nf">dec</span><span class="w"> </span><span class="no">rdx</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="no">rdi</span><span class="p">,</span><span class="w"> </span><span class="no">rsi</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="no">rsi</span><span class="p">,</span><span class="w"> </span><span class="no">rax</span> <span class="w"> </span><span class="nf">jmp</span><span class="w"> </span><span class="no">_fib</span><span class="w"> </span><span class="c1">## TAILCALL</span> <span class="na">...</span> </pre></div> <p>And if tail call elimination is disabled:</p> <div class="highlight"><pre><span></span><span class="na">...</span> <span class="w"> </span><span class="nf">add</span><span class="w"> </span><span class="no">rax</span><span class="p">,</span><span class="w"> </span><span class="no">rsi</span> <span class="w"> </span><span class="nf">dec</span><span class="w"> </span><span class="no">rdx</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="no">rdi</span><span class="p">,</span><span class="w"> </span><span class="no">rsi</span> <span class="w"> </span><span class="nf">mov</span><span class="w"> </span><span class="no">rsi</span><span class="p">,</span><span class="w"> </span><span class="no">rax</span> <span class="w"> </span><span class="nf">call</span><span class="w"> </span><span class="no">_fib</span> <span class="na">...</span> </pre></div> <h3 id="summary">Summary</h3><p>The last bit I haven't covered is how you track whether or not a call is in tail position. That is difficult to cover in a blog post because it's a matter of you propagating/not propagating at each syntax node type. But generally speaking, if the syntax node is not in tail position (e.g. not the last expression in a block), you drop the tail state you've built up. When you make a function call, you add the function name to the tail state.</p> <p>But I will be covering this in detail in the LLVM case in the next post in my <a href="http://notes.eatonphil.com/compiler-basics-llvm-conditionals.html">compiler basics</a> series.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Put together a survey and summary of tail call elimination, the effect and implementation, in an interpreter, a compiler targeting C++, and a compiler targeting LLVM IR. <a href="https://t.co/pXiLoXjw2u">https://t.co/pXiLoXjw2u</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1128640717679734784?ref_src=twsrc%5Etfw">May 15, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/tail-call-elimination.htmlTue, 14 May 2019 00:00:00 +0000Writing a lisp compiler from scratch in JavaScript: 4. LLVM conditionals and compiling fibonaccihttp://notes.eatonphil.com/compiler-basics-llvm-conditionals.html<p class="note"> Previously in compiler basics: <! forgive me, for I have sinned > <br /> <a href="/compiler-basics-lisp-to-assembly.html">1. lisp to assembly</a> <br /> <a href="/compiler-basics-functions.html">2. user-defined functions and variables</a> <br /> <a href="/compiler-basics-llvm.html">3. LLVM</a> <br /> Next in compiler basics: <br /> <a href="/compiler-basics-llvm-system-calls.html">5. LLVM system calls</a> <br /> <a href="/compiler-basics-an-x86-upgrade.html">6. an x86 upgrade</a> </p><p>In this post we'll extend the <a href="https://github.com/eatonphil/ulisp">compiler</a>'s LLVM backend to support compiling conditionals such that we can support an implementation of the fibonacci algorithm.</p> <p>Specifically we're aiming for the following:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>tests/fib.lisp <span class="o">(</span>def<span class="w"> </span>fib<span class="w"> </span><span class="o">(</span>n<span class="o">)</span> <span class="w"> </span><span class="o">(</span><span class="k">if</span><span class="w"> </span><span class="o">(</span>&lt;<span class="w"> </span>n<span class="w"> </span><span class="m">2</span><span class="o">)</span> <span class="w"> </span>n <span class="w"> </span><span class="o">(</span>+<span class="w"> </span><span class="o">(</span>fib<span class="w"> </span><span class="o">(</span>-<span class="w"> </span>n<span class="w"> </span><span class="m">1</span><span class="o">))</span><span class="w"> </span><span class="o">(</span>fib<span class="w"> </span><span class="o">(</span>-<span class="w"> </span>n<span class="w"> </span><span class="m">2</span><span class="o">)))))</span> <span class="o">(</span>def<span class="w"> </span>main<span class="w"> </span><span class="o">()</span> <span class="w"> </span><span class="o">(</span>fib<span class="w"> </span><span class="m">8</span><span class="o">))</span> $<span class="w"> </span>node<span class="w"> </span>src/ulisp.js<span class="w"> </span>tests/fib.lisp $<span class="w"> </span>./build/prog $<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="nv">$?</span> <span class="m">21</span> </pre></div> <p>To do this we'll have to add the <code><</code>, <code>-</code> and <code>if</code> built-ins.</p> <p><a href="https://github.com/eatonphil/ulisp">All source code is available on Github</a>.</p> <h3 id="subtraction">Subtraction</h3><p>This is the easiest to add since we already support addition. They are both arithmetic operations that produce an integer. We simply add a mapping of <code>-</code> to the LLVM instruction <code>sub</code> so our LLVM backend constructor (<code>src/backends/llvm.js</code>) looks like this:</p> <div class="highlight"><pre><span></span><span class="p">...</span> <span class="kd">class</span><span class="w"> </span><span class="nx">Compiler</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">constructor</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">outBuffer</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[];</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">primitiveFunctions</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">def</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileDefine</span><span class="p">.</span><span class="nx">bind</span><span class="p">(</span><span class="k">this</span><span class="p">),</span> <span class="w"> </span><span class="nx">begin</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileBegin</span><span class="p">.</span><span class="nx">bind</span><span class="p">(</span><span class="k">this</span><span class="p">),</span> <span class="w"> </span><span class="s1">&#39;if&#39;</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileIf</span><span class="p">.</span><span class="nx">bind</span><span class="p">(</span><span class="k">this</span><span class="p">),</span> <span class="w"> </span><span class="s1">&#39;+&#39;</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">&#39;add&#39;</span><span class="p">),</span> <span class="w"> </span><span class="s1">&#39;-&#39;</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">&#39;sub&#39;</span><span class="p">),</span> <span class="p">...</span> </pre></div> <h3 id="less-than">Less than</h3><p>The <code><</code> builtin is a logical operation. These are handled differently from arithmetic operations in LLVM IR. A logical operation looks like this:</p> <div class="highlight"><pre><span></span><span class="nv nv-Anonymous">%3</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">icmp</span><span class="w"> </span><span class="k">slt</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="nv nv-Anonymous">%1</span><span class="p">,</span><span class="w"> </span><span class="nv nv-Anonymous">%2</span> </pre></div> <p>This says that we're doing an integer comparison, <code>icmp</code>, (with signed less than, <code>slt</code>) on the <code>i32</code> integers in variables <code>%1</code> and <code>%2</code>.</p> <p>We can shim this into our existing <code>compileOp</code> helper like so:</p> <div class="highlight"><pre><span></span><span class="p">...</span> <span class="kd">class</span><span class="w"> </span><span class="nx">Compiler</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">constructor</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">outBuffer</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[];</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">primitiveFunctions</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">def</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileDefine</span><span class="p">.</span><span class="nx">bind</span><span class="p">(</span><span class="k">this</span><span class="p">),</span> <span class="w"> </span><span class="nx">begin</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileBegin</span><span class="p">.</span><span class="nx">bind</span><span class="p">(</span><span class="k">this</span><span class="p">),</span> <span class="w"> </span><span class="s1">&#39;if&#39;</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileIf</span><span class="p">.</span><span class="nx">bind</span><span class="p">(</span><span class="k">this</span><span class="p">),</span> <span class="w"> </span><span class="s1">&#39;+&#39;</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">&#39;add&#39;</span><span class="p">),</span> <span class="w"> </span><span class="s1">&#39;-&#39;</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">&#39;sub&#39;</span><span class="p">),</span> <span class="w"> </span><span class="s1">&#39;&lt;&#39;</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">&#39;icmp slt&#39;</span><span class="p">),</span> <span class="p">...</span> </pre></div> <h3 id="conditionals">Conditionals</h3><p>The last part we need to add is support for conditional execution of code at runtime. Assembly-like languages handle this with "jumps" and "labels". Jumping causes execution to continue at the address being jumped to (instead of just the line following the jump instruction). Labels give you a way of naming an address instead of having to calculate it yourself. Our code will look vaguely like this:</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nv">%test</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">icmp</span><span class="w"> </span><span class="k">slt</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="nv">%n</span><span class="p">,</span><span class="w"> </span><span class="nv nv-Anonymous">%1</span> <span class="w"> </span><span class="k">br</span><span class="w"> </span><span class="kt">i1</span><span class="w"> </span><span class="nv">%test</span><span class="p">,</span><span class="w"> </span><span class="kt">label</span><span class="w"> </span><span class="nv">%iftrue</span><span class="p">,</span><span class="w"> </span><span class="kt">label</span><span class="w"> </span><span class="nv">%iffalse</span> <span class="nl">iftrue:</span> <span class="w"> </span><span class="c">; do true stuff</span> <span class="nl">iffalse:</span> <span class="w"> </span><span class="c">; do false stuff</span> <span class="w"> </span><span class="c">; do next stuff</span> </pre></div> <p>The <code>br</code> instruction can jump (or branch) conditionally or unconditionally. This snippet demonstrates a conditional jump.</p> <p>But there are a few things wrong with this pseudo-code. First off if the condition is true, execution will just continue on into the false section once finished. Second, LLVM IR actually requires all labels to end with a branch instruction. So we'll add a new label after the true and false section called <code>ifresult</code> and jump to it unconditionally after both.</p> <div class="highlight"><pre><span></span><span class="w"> </span><span class="nv">%test</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">icmp</span><span class="w"> </span><span class="k">slt</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="nv">%n</span><span class="p">,</span><span class="w"> </span><span class="nv nv-Anonymous">%1</span> <span class="w"> </span><span class="k">br</span><span class="w"> </span><span class="kt">i1</span><span class="w"> </span><span class="nv">%test</span><span class="p">,</span><span class="w"> </span><span class="kt">label</span><span class="w"> </span><span class="nv">%iftrue</span><span class="p">,</span><span class="w"> </span><span class="kt">label</span><span class="w"> </span><span class="nv">%iffalse</span> <span class="nl">iftrue:</span> <span class="w"> </span><span class="c">; do true stuff</span> <span class="w"> </span><span class="k">br</span><span class="w"> </span><span class="kt">label</span><span class="w"> </span><span class="nv">%ifresult</span> <span class="nl">iffalse:</span> <span class="w"> </span><span class="c">; do false stuff</span> <span class="w"> </span><span class="k">br</span><span class="w"> </span><span class="kt">label</span><span class="w"> </span><span class="nv">%ifresult</span> <span class="nl">ifresult:</span> <span class="w"> </span><span class="c">; do next stuff</span> </pre></div> <h3 id="scope">Scope</h3><p>One last thing we'll need to do before implementing the code generation for this is to update our <code>Scope</code> class to accept symbol prefixes so we can pass our labels through Scope to make sure they are unique but still have useful names.</p> <div class="highlight"><pre><span></span><span class="p">...</span> <span class="kd">class</span><span class="w"> </span><span class="nx">Scope</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">...</span> <span class="w"> </span><span class="nx">symbol</span><span class="p">(</span><span class="nx">prefix</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;sym&#39;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">nth</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">Object</span><span class="p">.</span><span class="nx">keys</span><span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">locals</span><span class="p">).</span><span class="nx">length</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mf">1</span><span class="p">;</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">register</span><span class="p">(</span><span class="nx">prefix</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">nth</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="p">...</span> </pre></div> <h3 id="compileif">compileIf</h3><p>Now we can add a primitive function mapping <code>if</code> to a new <code>compileIf</code> helper and implement the helper.</p> <div class="highlight"><pre><span></span><span class="p">...</span> <span class="kd">class</span><span class="w"> </span><span class="nx">Compiler</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">constructor</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">outBuffer</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[];</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">primitiveFunctions</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">def</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileDefine</span><span class="p">.</span><span class="nx">bind</span><span class="p">(</span><span class="k">this</span><span class="p">),</span> <span class="w"> </span><span class="nx">begin</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileBegin</span><span class="p">.</span><span class="nx">bind</span><span class="p">(</span><span class="k">this</span><span class="p">),</span> <span class="w"> </span><span class="s1">&#39;+&#39;</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">&#39;add&#39;</span><span class="p">),</span> <span class="w"> </span><span class="s1">&#39;-&#39;</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">&#39;sub&#39;</span><span class="p">),</span> <span class="w"> </span><span class="s1">&#39;&lt;&#39;</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">&#39;icmp slt&#39;</span><span class="p">),</span> <span class="w"> </span><span class="s1">&#39;if&#39;</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileIf</span><span class="p">.</span><span class="nx">bind</span><span class="p">(</span><span class="k">this</span><span class="p">),</span> <span class="p">...</span> <span class="w"> </span><span class="nx">compileIf</span><span class="p">([</span><span class="nx">test</span><span class="p">,</span><span class="w"> </span><span class="nx">thenBlock</span><span class="p">,</span><span class="w"> </span><span class="nx">elseBlock</span><span class="p">],</span><span class="w"> </span><span class="nx">destination</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">testVariable</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">symbol</span><span class="p">();</span> <span class="w"> </span><span class="c1">// Compile expression and branch</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">test</span><span class="p">,</span><span class="w"> </span><span class="nx">testVariable</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">);</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">trueLabel</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">symbol</span><span class="p">(</span><span class="s1">&#39;iftrue&#39;</span><span class="p">);</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">falseLabel</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">symbol</span><span class="p">(</span><span class="s1">&#39;iffalse&#39;</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`br i1 %</span><span class="si">${</span><span class="nx">testVariable</span><span class="si">}</span><span class="sb">, label %</span><span class="si">${</span><span class="nx">trueLabel</span><span class="si">}</span><span class="sb">, label %</span><span class="si">${</span><span class="nx">falseLabel</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Compile true section</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="nx">trueLabel</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s1">&#39;:&#39;</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">thenBlock</span><span class="p">,</span><span class="w"> </span><span class="nx">destination</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">);</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">endLabel</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">symbol</span><span class="p">(</span><span class="s1">&#39;ifend&#39;</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;br label %&#39;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">endLabel</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="nx">falseLabel</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s1">&#39;:&#39;</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Compile false section</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">elseBlock</span><span class="p">,</span><span class="w"> </span><span class="nx">destination</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;br label %&#39;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">endLabel</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Compile cleanup</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="nx">endLabel</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s1">&#39;:&#39;</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="p">...</span> </pre></div> <p>Note that this code generation sends the <code>destination<code> variable into both the true and false sections. Let's try it out.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>node<span class="w"> </span>src/ulisp.js<span class="w"> </span>tests/fib.lisp llc:<span class="w"> </span>error:<span class="w"> </span>llc:<span class="w"> </span>build/prog.ll:19:3:<span class="w"> </span>error:<span class="w"> </span>multiple<span class="w"> </span>definition<span class="w"> </span>of<span class="w"> </span><span class="nb">local</span><span class="w"> </span>value<span class="w"> </span>named<span class="w"> </span><span class="s1">&#39;sym5&#39;</span> <span class="w"> </span>%sym5<span class="w"> </span><span class="o">=</span><span class="w"> </span>add<span class="w"> </span>i32<span class="w"> </span>%sym15,<span class="w"> </span>%sym16 <span class="w"> </span>^ child_process.js:665 <span class="w"> </span>throw<span class="w"> </span>err<span class="p">;</span> <span class="w"> </span>^ Error:<span class="w"> </span>Command<span class="w"> </span>failed:<span class="w"> </span>llc<span class="w"> </span>-o<span class="w"> </span>build/prog.s<span class="w"> </span>build/prog.ll llc:<span class="w"> </span>error:<span class="w"> </span>llc:<span class="w"> </span>build/prog.ll:19:3:<span class="w"> </span>error:<span class="w"> </span>multiple<span class="w"> </span>definition<span class="w"> </span>of<span class="w"> </span><span class="nb">local</span><span class="w"> </span>value<span class="w"> </span>named<span class="w"> </span><span class="s1">&#39;sym5&#39;</span> <span class="w"> </span>%sym5<span class="w"> </span><span class="o">=</span><span class="w"> </span>add<span class="w"> </span>i32<span class="w"> </span>%sym15,<span class="w"> </span>%sym16 </pre></div> <p>That's annoying. An unfortunate aspect of LLVM's required single-static assignment form is that you cannot reuse variable names within a function even if it is not possible for the variable to be actually reused.</p> <p>To work around this we need to allocate memory on the stack, store the result in each true/false section in this location, and read from this location afterward to store it in the destination variable.</p> <h3 id="stack-memory-instructions">Stack memory instructions</h3><p>LLVM IR gives us <code>alloca</code> to allocate memory on the stack, <code>store</code> to store memory at a stack address, and <code>load</code> to read the value at a stack address into a variable. Here's a simple example:</p> <div class="highlight"><pre><span></span><span class="nv">%myvar</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="m">42</span><span class="p">,</span><span class="w"> </span><span class="m">0</span> <span class="nv">%stackaddress</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">alloca</span><span class="w"> </span><span class="kt">i32</span><span class="p">,</span><span class="w"> </span><span class="k">align</span><span class="w"> </span><span class="m">4</span> <span class="k">store</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="nv">%myvar</span><span class="p">,</span><span class="w"> </span><span class="kt">i32</span><span class="p">*</span><span class="w"> </span><span class="nv">%stackaddress</span><span class="p">,</span><span class="w"> </span><span class="k">align</span><span class="w"> </span><span class="m">4</span> <span class="nv">%newvar</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">load</span><span class="w"> </span><span class="kt">i32</span><span class="p">,</span><span class="w"> </span><span class="kt">i32</span><span class="p">*</span><span class="w"> </span><span class="nv">%stackaddress</span><span class="p">,</span><span class="w"> </span><span class="k">align</span><span class="w"> </span><span class="m">4</span> </pre></div> <p>Such that <code>newvar</code> is now 42.</p> <h3 id="compileif-again">compileIf again</h3><p>Applying this back to our <code>compileIf</code> helper gives us:</p> <div class="highlight"><pre><span></span><span class="p">...</span> <span class="w"> </span><span class="nx">compileIf</span><span class="p">([</span><span class="nx">test</span><span class="p">,</span><span class="w"> </span><span class="nx">thenBlock</span><span class="p">,</span><span class="w"> </span><span class="nx">elseBlock</span><span class="p">],</span><span class="w"> </span><span class="nx">destination</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">testVariable</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">symbol</span><span class="p">();</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">result</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">symbol</span><span class="p">(</span><span class="s1">&#39;ifresult&#39;</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Space for result</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`%</span><span class="si">${</span><span class="nx">result</span><span class="si">}</span><span class="sb"> = alloca i32, align 4`</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Compile expression and branch</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">test</span><span class="p">,</span><span class="w"> </span><span class="nx">testVariable</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">);</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">trueLabel</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">symbol</span><span class="p">(</span><span class="s1">&#39;iftrue&#39;</span><span class="p">);</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">falseLabel</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">symbol</span><span class="p">(</span><span class="s1">&#39;iffalse&#39;</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`br i1 %</span><span class="si">${</span><span class="nx">testVariable</span><span class="si">}</span><span class="sb">, label %</span><span class="si">${</span><span class="nx">trueLabel</span><span class="si">}</span><span class="sb">, label %</span><span class="si">${</span><span class="nx">falseLabel</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Compile true section</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="nx">trueLabel</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s1">&#39;:&#39;</span><span class="p">);</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">tmp1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">symbol</span><span class="p">();</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">thenBlock</span><span class="p">,</span><span class="w"> </span><span class="nx">tmp1</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`store i32 %</span><span class="si">${</span><span class="nx">tmp1</span><span class="si">}</span><span class="sb">, i32* %</span><span class="si">${</span><span class="nx">result</span><span class="si">}</span><span class="sb">, align 4`</span><span class="p">);</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">endLabel</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">symbol</span><span class="p">(</span><span class="s1">&#39;ifend&#39;</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;br label %&#39;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">endLabel</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="nx">falseLabel</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s1">&#39;:&#39;</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Compile false section</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">tmp2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">symbol</span><span class="p">();</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">elseBlock</span><span class="p">,</span><span class="w"> </span><span class="nx">tmp2</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`store i32 %</span><span class="si">${</span><span class="nx">tmp2</span><span class="si">}</span><span class="sb">, i32* %</span><span class="si">${</span><span class="nx">result</span><span class="si">}</span><span class="sb">, align 4`</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;br label %&#39;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">endLabel</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Compile cleanup</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="nx">endLabel</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="s1">&#39;:&#39;</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`%</span><span class="si">${</span><span class="nx">destination</span><span class="si">}</span><span class="sb"> = load i32, i32* %</span><span class="si">${</span><span class="nx">result</span><span class="si">}</span><span class="sb">, align 4`</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="p">...</span> </pre></div> <h3 id="trying-it-out">Trying it out</h3><p>We run our compiler one more time:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>node<span class="w"> </span>src/ulisp.js<span class="w"> </span>tests/fib.lisp $<span class="w"> </span>./build/prog $<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="nv">$?</span> <span class="m">21</span> </pre></div> <p>And get what we expect!</p> <h3 id="next-up">Next up</h3><ul> <li>Tail call optimization</li> <li>Lists and dynamic memory</li> <li>Strings?</li> <li>Foreign function calls?</li> <li>Self-hosting?</li> </ul> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Latest post in the compiler basics series: using LLVM conditionals in compiling a fibonacci program <a href="https://t.co/A72yEDQ8sd">https://t.co/A72yEDQ8sd</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1125072731408666624?ref_src=twsrc%5Etfw">May 5, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/compiler-basics-llvm-conditionals.htmlSat, 04 May 2019 00:00:00 +0000Responsibility and ownershiphttp://notes.eatonphil.com/responsibility-and-ownership.html<p>Responsibility is only possible by granting ownership and setting expectations. If you don't turn over ownership, don't expect folks to take responsibility. When you grant ownership and set expectations, you'll be astounded what folks will accomplish without you.</p> <p>I am astounded.</p> http://notes.eatonphil.com/responsibility-and-ownership.htmlTue, 30 Apr 2019 00:00:00 +0000Interpreting TypeScripthttp://notes.eatonphil.com/interpreting-typescript.html<p>In addition to providing a static type system and compiler for a superset of JavaScript, TypeScript makes much of its functionality available programmatically. In this post we'll use the TypeScript compiler API to build an interpreter. We'll build off of a <a href="https://github.com/Microsoft/TypeScript/wiki/Using-the-Compiler-API">TypeScript wiki article</a> and cover a few areas that were confusing to me as I built out <a href="https://github.com/eatonphil/jsc">a separate project</a>.</p> <p>The end result we're building will look like this:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>test.ts<span class="w"> </span><span class="c1"># A program we can interpret</span> print<span class="o">(</span><span class="m">1</span><span class="w"> </span>+<span class="w"> </span><span class="m">5</span><span class="o">)</span><span class="p">;</span> $<span class="w"> </span>tsc<span class="w"> </span>interpreter.ts<span class="w"> </span><span class="c1"># Build the source code for the interpreter</span> $<span class="w"> </span>node<span class="w"> </span>interpreter.js<span class="w"> </span>test.ts<span class="w"> </span><span class="c1"># Run the interpreter against test program</span> <span class="m">6</span> </pre></div> <p><a href="https://github.com/eatonphil/jsi">All code is available on Github.</a></p> <h3 id="setup">Setup</h3><p>To begin with, we need Node.js and some dependencies:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>yarn<span class="w"> </span>add<span class="w"> </span>typescript<span class="w"> </span>@types/node </pre></div> <p>Then we can begin the first stage of an interpreter: parsing the code.</p> <h3 id="parsing">Parsing</h3><p>Parsing a fixed set of files is simple enough. We pass a list of files to <code>createProgram</code> along with compiler options. But, as a user, we don't want to keep track of all files used by a program (i.e. everything we import). The most ideal situation is to pass a single-file entrypoint (something like a main.js) and have our interpreter figure out all the imports and handle them recursively. More on this later, for now we'll just parse the single-file entrypoint.</p> <div class="highlight"><pre><span></span><span class="k">import</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="kr">as</span><span class="w"> </span><span class="nx">ts</span><span class="w"> </span><span class="kr">from</span><span class="w"> </span><span class="s1">&#39;typescript&#39;</span><span class="p">;</span> <span class="kd">const</span><span class="w"> </span><span class="nx">TS_COMPILER_OPTIONS</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">allowNonTsExtensions</span><span class="o">:</span><span class="w"> </span><span class="kt">true</span><span class="p">,</span> <span class="p">};</span> <span class="kd">function</span><span class="w"> </span><span class="nx">parse</span><span class="p">(</span><span class="nx">fileName</span><span class="o">:</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="o">:</span><span class="w"> </span><span class="nx">ts</span><span class="p">.</span><span class="nx">Program</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">ts</span><span class="p">.</span><span class="nx">createProgram</span><span class="p">([</span><span class="nx">fileName</span><span class="p">],</span><span class="w"> </span><span class="nx">TS_COMPILER_OPTIONS</span><span class="p">);</span> <span class="p">}</span> <span class="kd">function</span><span class="w"> </span><span class="nx">interpret</span><span class="p">(</span><span class="nx">program</span><span class="o">:</span><span class="w"> </span><span class="kt">ts.Program</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="c1">// TODO }</span> <span class="kd">function</span><span class="w"> </span><span class="nx">main</span><span class="p">(</span><span class="nx">entrypoint</span><span class="o">:</span><span class="w"> </span><span class="kt">string</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">program</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">parse</span><span class="p">(</span><span class="nx">entrypoint</span><span class="p">);</span> <span class="w"> </span><span class="nx">interpret</span><span class="p">(</span><span class="nx">program</span><span class="p">);</span> <span class="p">}</span> <span class="nx">main</span><span class="p">(</span><span class="nx">process</span><span class="p">.</span><span class="nx">argv</span><span class="p">[</span><span class="mf">2</span><span class="p">]);</span> </pre></div> <h3 id="interpret-and-ts.program">interpret and ts.Program</h3><p>A program contains all source files as well as any implicitly needed TypeScript definition files (for us it will just be the TypeScript definitions for the Node.js standard library).</p> <p class="note"> The program also gives us access to a type checker that we can use to query the type of any node in the program tree. We'll get into this in another post. </p><p>Our interpret program will iterate over all the source files, ignoring the TypeScript definition files, and call interpretNode on all the elements of the source file.</p> <div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">interpretNode</span><span class="p">(</span><span class="nx">node</span><span class="o">:</span><span class="w"> </span><span class="kt">ts.Node</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="c1">// TODO }</span> <span class="kd">function</span><span class="w"> </span><span class="nx">interpret</span><span class="p">(</span><span class="nx">program</span><span class="o">:</span><span class="w"> </span><span class="kt">ts.Program</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">program</span><span class="p">.</span><span class="nx">getSourceFiles</span><span class="p">().</span><span class="nx">map</span><span class="p">((</span><span class="nx">source</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">fileName</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">source</span><span class="p">;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">fileName</span><span class="p">.</span><span class="nx">endsWith</span><span class="p">(</span><span class="s1">&#39;.d.ts&#39;</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">results</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[];</span> <span class="w"> </span><span class="nx">ts</span><span class="p">.</span><span class="nx">forEachChild</span><span class="p">(</span><span class="nx">source</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="nx">node</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">results</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">interpretNode</span><span class="p">(</span><span class="nx">node</span><span class="p">));</span> <span class="w"> </span><span class="p">});</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">results</span><span class="p">;</span> <span class="w"> </span><span class="p">});</span> <span class="p">}</span> </pre></div> <h3 id="interpretnode-and-ts.node">interpretNode and ts.Node</h3><p>A Node is a wrapper for most elements of what we consider a program to be, such as a binary expression (<code>2 + 3</code>), a literal expression (<code>2</code>), a function call expression (<code>a(c)</code>), and so forth. When exploring a parser, it takes time to become familiar with the particular way that a parser breaks out a program into a tree of nodes.</p> <p>As a concrete example, the following program:</p> <div class="highlight"><pre><span></span><span class="nx">print</span><span class="p">(</span><span class="nx">a</span><span class="p">);</span> </pre></div> <p>Will be built into ts.Node tree along these lines:</p> <div class="highlight"><pre><span></span><span class="n">Node</span><span class="o">:</span><span class="w"> </span><span class="n">ExpressionStatement</span><span class="o">:</span><span class="w"> </span><span class="n">print</span><span class="o">(</span><span class="n">a</span><span class="o">);</span> <span class="w"> </span><span class="n">Node</span><span class="o">:</span><span class="w"> </span><span class="n">CallExpression</span><span class="o">:</span><span class="w"> </span><span class="n">print</span><span class="o">,</span><span class="w"> </span><span class="n">a</span> <span class="w"> </span><span class="n">Node</span><span class="o">:</span><span class="w"> </span><span class="n">Identifier</span><span class="o">:</span><span class="w"> </span><span class="n">print</span> <span class="w"> </span><span class="n">Node</span><span class="o">:</span><span class="w"> </span><span class="n">Identifier</span><span class="o">:</span><span class="w"> </span><span class="n">a</span> <span class="n">Node</span><span class="o">:</span><span class="w"> </span><span class="n">EndOfFileToken</span> </pre></div> <p>And another example:</p> <div class="highlight"><pre><span></span><span class="mf">1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mf">3</span><span class="p">;</span> </pre></div> <p>Will be built into a ts.Node tree along these lines:</p> <div class="highlight"><pre><span></span><span class="n">Node</span><span class="o">:</span><span class="w"> </span><span class="n">Expression</span><span class="o">:</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mi">3</span> <span class="w"> </span><span class="n">Node</span><span class="o">:</span><span class="w"> </span><span class="n">BinaryExpression</span><span class="o">:</span><span class="w"> </span><span class="mi">1</span><span class="o">,</span><span class="w"> </span><span class="mi">3</span><span class="o">,</span><span class="w"> </span><span class="o">+</span> <span class="w"> </span><span class="n">Node</span><span class="o">:</span><span class="w"> </span><span class="n">NumericLiteral</span><span class="o">:</span><span class="w"> </span><span class="mi">1</span> <span class="w"> </span><span class="n">Node</span><span class="o">:</span><span class="w"> </span><span class="n">NumericLiteral</span><span class="o">:</span><span class="w"> </span><span class="mi">3</span> <span class="w"> </span><span class="n">Node</span><span class="o">:</span><span class="w"> </span><span class="n">PlusToken</span> <span class="n">Node</span><span class="o">:</span><span class="w"> </span><span class="n">EndOfFileToken</span> </pre></div> <p>But how would one come to know this?</p> <h4 id="exploring-the-ts.node-tree">Exploring the ts.Node tree</h4><p>The easiest thing to do is throw an error on every Node type we don't yet know about and fill in support for each program we throw at the interpreter.</p> <p>For example:</p> <div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">interpretNode</span><span class="p">(</span><span class="nx">node</span><span class="o">:</span><span class="w"> </span><span class="kt">ts.Node</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="nx">node</span><span class="p">.</span><span class="nx">kind</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">default</span><span class="o">:</span> <span class="w"> </span><span class="kt">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">&#39;Unsupported node type: &#39;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">ts</span><span class="p">.</span><span class="nx">SyntaxKind</span><span class="p">[</span><span class="nx">node</span><span class="p">.</span><span class="nx">kind</span><span class="p">]);</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>Now let's run our interpreter against an input file, <code>test.ts</code>, that combines these two to make a semi-interesting program:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>test.ts print<span class="o">(</span><span class="m">1</span><span class="w"> </span>+<span class="w"> </span><span class="m">2</span><span class="o">)</span><span class="p">;</span> $<span class="w"> </span>tsc<span class="w"> </span>interpreter.ts $<span class="w"> </span>node<span class="w"> </span>interpreter.js<span class="w"> </span>test.ts ... Error:<span class="w"> </span>Unsupported<span class="w"> </span>node<span class="w"> </span>type:<span class="w"> </span>ExpressionStatement ... </pre></div> <p>And we see an outer wrapper, an ExpressionStatement. To proceed we look up the definition of an ExpressionStatement in TypeScript source code, <a href="https://github.com/Microsoft/TypeScript/blob/master/src/compiler/types.ts">src/compiler/types.ts</a> to be specific. This file will become our best friend. Hit ctrl-f and look for "interface ExpressionStatement ". We see that it has only one child, <code>expression</code>, so we call <code>interpretNode</code> on this recursively:</p> <div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">interpretNode</span><span class="p">(</span><span class="nx">node</span><span class="o">:</span><span class="w"> </span><span class="kt">ts.Node</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="nx">node</span><span class="p">.</span><span class="nx">kind</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">ts</span><span class="p">.</span><span class="nx">SyntaxKind</span><span class="p">.</span><span class="nx">ExpressionStatement</span><span class="o">:</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">es</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">node</span><span class="w"> </span><span class="kr">as</span><span class="w"> </span><span class="nx">ts</span><span class="p">.</span><span class="nx">ExpressionStatement</span><span class="p">;</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">interpretNode</span><span class="p">(</span><span class="nx">es</span><span class="p">.</span><span class="nx">expression</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">default</span><span class="o">:</span> <span class="w"> </span><span class="kt">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">&#39;Unsupported node type: &#39;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">ts</span><span class="p">.</span><span class="nx">SyntaxKind</span><span class="p">[</span><span class="nx">node</span><span class="p">.</span><span class="nx">kind</span><span class="p">]);</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>Thankfully TypeScript will be very quick to call us out if we misunderstand this structure.</p> <p class="note"> It's pretty weird to me that the ts.Node tree is organized such that I must cast at each ts.Node but that's what they do even in the TypeScript source so I don't think I'm misunderstanding. </p><p>Now we recompile and run the interpreter against the program to discover the next ts.Node type.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>tsc<span class="w"> </span>interpreter.ts $<span class="w"> </span>node<span class="w"> </span>interpreter.js<span class="w"> </span>test.ts ... Error:<span class="w"> </span>Unsupported<span class="w"> </span>node<span class="w"> </span>type:<span class="w"> </span>CallExpression ... </pre></div> <p>Cool! Back to <a href="https://github.com/Microsoft/TypeScript/blob/master/src/compiler/types.ts">src/compiler/types.ts</a>. Call expressions are complex enough that we'll break out handling them into a separate function.</p> <h3 id="interpretcall-and-ts.callexpression">interpretCall and ts.CallExpression</h3><p>From our reading of types.ts we need to handle the expression that evaluates to a function, and we need to handle its parameters. We'll just call <code>interpretNode</code> on each of these to get their real value. And finally we'll call the function with the arguments.</p> <div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">interpretCall</span><span class="p">(</span><span class="nx">ce</span><span class="o">:</span><span class="w"> </span><span class="kt">ts.CallExpression</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">fn</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">interpretNode</span><span class="p">(</span><span class="nx">ce</span><span class="p">.</span><span class="nx">expression</span><span class="p">);</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">args</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">ce</span><span class="p">.</span><span class="nx">arguments</span><span class="p">.</span><span class="nx">map</span><span class="p">(</span><span class="nx">interpretNode</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fn</span><span class="p">(...</span><span class="nx">args</span><span class="p">);</span> <span class="p">}</span> <span class="kd">function</span><span class="w"> </span><span class="nx">interpretNode</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="nx">node</span><span class="p">.</span><span class="nx">kind</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">...</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">ts</span><span class="p">.</span><span class="nx">SyntaxKind</span><span class="p">.</span><span class="nx">CallExpression</span><span class="o">:</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">ce</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">node</span><span class="w"> </span><span class="kr">as</span><span class="w"> </span><span class="nx">ts</span><span class="p">.</span><span class="nx">CallExpression</span><span class="p">;</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">interpretCall</span><span class="p">(</span><span class="nx">ce</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">...</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p class="node"> Please ignore the fact that we are not correctly setting <code>this</code> here. </p><p>Recompile and let's see what's next!</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>tsc<span class="w"> </span>interpreter.ts $<span class="w"> </span>node<span class="w"> </span>interpreter.js<span class="w"> </span>test.ts ... Error:<span class="w"> </span>Unsupported<span class="w"> </span>node<span class="w"> </span>type:<span class="w"> </span>Identifier ... </pre></div> <p>And back to types.ts.</p> <h3 id="ts.identifier">ts.Identifier</h3><p>In order to support identifiers in general we'd need to have a context we could use to look up the value of an identifier. But we don't have a context like this right now so we'll add builtin support for a <code>print</code> function so we can get some output!</p> <div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">interpretNode</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="nx">node</span><span class="p">.</span><span class="nx">kind</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">...</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">ts</span><span class="p">.</span><span class="nx">SyntaxKind</span><span class="p">.</span><span class="nx">Identifier</span><span class="o">:</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">id</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="nx">node</span><span class="w"> </span><span class="kr">as</span><span class="w"> </span><span class="nx">ts</span><span class="p">.</span><span class="nx">Identifier</span><span class="p">).</span><span class="nx">escapedText</span><span class="w"> </span><span class="kr">as</span><span class="w"> </span><span class="kt">string</span><span class="p">;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">id</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="s1">&#39;print&#39;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="p">(...</span><span class="nx">args</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(...</span><span class="nx">args</span><span class="p">);</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">&#39;Unsupported identifier: &#39;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">id</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">...</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>Recompile and let's see what's next!</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>tsc<span class="w"> </span>interpreter.ts $<span class="w"> </span>node<span class="w"> </span>interpreter.js<span class="w"> </span>test.ts ... Error:<span class="w"> </span>Unsupported<span class="w"> </span>node<span class="w"> </span>type:<span class="w"> </span>BinaryExpression ... </pre></div> <p>And we're finally into the parameters.</p> <h3 id="interpretbinaryexpression-and-ts.binaryexpression">interpretBinaryExpression and ts.BinaryExpression</h3><p>Looking into types.ts for this Node type suggests we may want to break this out into its own function as well; there are a ton of operator types. Within the <code>interpretBinaryExpression</code> helper we'll interpret each operand and then switch on the operator type. We'll throw an error on operators we don't know about -- all of them at first:</p> <div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">interpretBinaryExpression</span><span class="p">(</span><span class="nx">be</span><span class="o">:</span><span class="w"> </span><span class="kt">ts.BinaryExpression</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">left</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">interpretNode</span><span class="p">(</span><span class="nx">be</span><span class="p">.</span><span class="nx">left</span><span class="p">);</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">right</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">interpretNode</span><span class="p">(</span><span class="nx">be</span><span class="p">.</span><span class="nx">right</span><span class="p">);</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="nx">be</span><span class="p">.</span><span class="nx">operatorToken</span><span class="p">.</span><span class="nx">kind</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">default</span><span class="o">:</span> <span class="w"> </span><span class="kt">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">&#39;Unsupported operator: &#39;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">ts</span><span class="p">.</span><span class="nx">SyntaxKind</span><span class="p">[</span><span class="nx">be</span><span class="p">.</span><span class="nx">operatorToken</span><span class="p">.</span><span class="nx">kind</span><span class="p">]);</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> <span class="kd">function</span><span class="w"> </span><span class="nx">interpretNode</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="nx">node</span><span class="p">.</span><span class="nx">kind</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">...</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">ts</span><span class="p">.</span><span class="nx">SyntaxKind</span><span class="p">.</span><span class="nx">BinaryExpression</span><span class="o">:</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">be</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">node</span><span class="w"> </span><span class="kr">as</span><span class="w"> </span><span class="nx">ts</span><span class="p">.</span><span class="nx">BinaryExpression</span><span class="p">;</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">interpretBinaryExpression</span><span class="p">(</span><span class="nx">be</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">...</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>We know the drill.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>tsc<span class="w"> </span>interpreter.ts $<span class="w"> </span>node<span class="w"> </span>interpreter.js<span class="w"> </span>test.ts ... Error:<span class="w"> </span>Unsupported<span class="w"> </span>node<span class="w"> </span>type:<span class="w"> </span>FirstLiteralToken ... </pre></div> <p>At this point we're actually failing first on an unknown <strong>node type</strong> rather than an operator. This is because we interpret the operands (which are numeric literals) before we look up the operator. Time to revisit types.ts!</p> <h3 id="ts.firstliteraltoken,-ts.numericliteral">ts.FirstLiteralToken, ts.NumericLiteral</h3><p>Looking at types.ts shows us that <code>FirstLiteralToken</code> is a synonym for <code>NumericLiteral</code>. The latter name is more obvious, so let's add that to our supported Node list:</p> <div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">interpretNode</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="nx">node</span><span class="p">.</span><span class="nx">kind</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">...</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">ts</span><span class="p">.</span><span class="nx">SyntaxKind</span><span class="p">.</span><span class="nx">NumericLiteral</span><span class="o">:</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">nl</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">node</span><span class="w"> </span><span class="kr">as</span><span class="w"> </span><span class="nx">ts</span><span class="p">.</span><span class="nx">NumericLiteral</span><span class="p">;</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nb">Number</span><span class="p">(</span><span class="nx">nl</span><span class="p">.</span><span class="nx">text</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">...</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>And we keep going!</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>tsc<span class="w"> </span>interpreter.ts $<span class="w"> </span>node<span class="w"> </span>interpreter.js<span class="w"> </span>test.ts ... Error:<span class="w"> </span>Unsupported<span class="w"> </span>operator:<span class="w"> </span>PlusToken ... </pre></div> <p>And we're into unknown operator territory!</p> <h3 id="interpretbinaryexpression-and-ts.plustoken">interpretBinaryExpression and ts.PlusToken</h3><p>A simple extension to our existing <code>interpretBinaryExpression</code>, we return the sum of the left and right values:</p> <div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">interpretBinaryExpression</span><span class="p">(</span><span class="nx">be</span><span class="o">:</span><span class="w"> </span><span class="kt">ts.BinaryExpression</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">left</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">interpretNode</span><span class="p">(</span><span class="nx">be</span><span class="p">.</span><span class="nx">left</span><span class="p">);</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">right</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">interpretNode</span><span class="p">(</span><span class="nx">be</span><span class="p">.</span><span class="nx">right</span><span class="p">);</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="nx">be</span><span class="p">.</span><span class="nx">operatorToken</span><span class="p">.</span><span class="nx">kind</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">ts.SyntaxKind.PlusToken</span><span class="o">:</span> <span class="w"> </span><span class="kt">return</span><span class="w"> </span><span class="nx">left</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">right</span><span class="p">;</span> <span class="w"> </span><span class="nx">default</span><span class="o">:</span> <span class="w"> </span><span class="kt">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">&#39;Unsupported operator: &#39;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">ts</span><span class="p">.</span><span class="nx">SyntaxKind</span><span class="p">[</span><span class="nx">be</span><span class="p">.</span><span class="nx">operatorToken</span><span class="p">.</span><span class="nx">kind</span><span class="p">]);</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>And we give it another shot.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>tsc<span class="w"> </span>interpreter.ts $<span class="w"> </span>node<span class="w"> </span>interpreter.js<span class="w"> </span>test.ts ... Error:<span class="w"> </span>Unsupported<span class="w"> </span>node<span class="w"> </span>type:<span class="w"> </span>EndOfFileToken ... </pre></div> <h3 id="ts.syntaxkind.endoffiletoken">ts.SyntaxKind.EndOfFileToken</h3><p>Our final Node type before a working program, we simply do nothing:</p> <div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">interpretNode</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="nx">node</span><span class="p">.</span><span class="nx">kind</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">...</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="nx">ts.SyntaxKind.EndOfFileToken</span><span class="o">:</span> <span class="w"> </span><span class="kt">break</span><span class="p">;</span> <span class="w"> </span><span class="p">...</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>One more time:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>tsc<span class="w"> </span>interpreter.ts $<span class="w"> </span>node<span class="w"> </span>interpreter.js<span class="w"> </span>test.ts <span class="m">3</span> </pre></div> <p>A working program! And if we jiggle the test?</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>test.ts print<span class="o">(</span><span class="m">1</span><span class="w"> </span>+<span class="w"> </span><span class="m">5</span><span class="o">)</span><span class="p">;</span> $<span class="w"> </span>node<span class="w"> </span>interpreter.js<span class="w"> </span>test.ts <span class="m">6</span> </pre></div> <p>We're well on our way to interpreting TypeScript, and have gained some familiarity with the TypeScript Compiler API.</p> <p><a href="https://github.com/eatonphil/jsi">All code is available on Github.</a></p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Quick intro to the TypeScript Compiler API by writing an interpreter <a href="https://t.co/QKz3XtOuP9">https://t.co/QKz3XtOuP9</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1117461518801604613?ref_src=twsrc%5Etfw">April 14, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/interpreting-typescript.htmlSun, 14 Apr 2019 00:00:00 +0000Writing a web server from scratch: 1. HTTP and socketshttp://notes.eatonphil.com/web-server-basics-http-and-sockets.html<p>Say we have some HTML:</p> <div class="highlight"><pre><span></span><span class="p">&lt;</span><span class="nt">html</span><span class="p">&gt;</span> <span class="p">&lt;</span><span class="nt">body</span><span class="p">&gt;</span> <span class="p">&lt;</span><span class="nt">h1</span><span class="p">&gt;</span>Hello world!<span class="p">&lt;/</span><span class="nt">h1</span><span class="p">&gt;</span> <span class="p">&lt;/</span><span class="nt">body</span><span class="p">&gt;</span> <span class="p">&lt;/</span><span class="nt">html</span><span class="p">&gt;</span> </pre></div> <p>And say we'd like to be able to render this page in a web browser. If the server is hosted locally we may want to enter <code>localhost:9000/hello-world.html</code> in the address bar, hit enter, make a request (done by the browser), receive a response (sent by some server), and render the result (done by the browser).</p> <p>Here is a minimal, often incomplete, and unsafe Node.js program (about 100 LoC) that would serve this (<a href="https://github.com/eatonphil/uweb">code available on Github</a>):</p> <div class="highlight"><pre><span></span><span class="kd">const</span><span class="w"> </span><span class="nx">fs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">&#39;fs&#39;</span><span class="p">);</span> <span class="kd">const</span><span class="w"> </span><span class="nx">net</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">&#39;net&#39;</span><span class="p">);</span> <span class="kd">const</span><span class="w"> </span><span class="nx">CRLF</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;\r\n&#39;</span><span class="p">;</span> <span class="kd">const</span><span class="w"> </span><span class="nx">HELLO_WORLD</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="sb">`&lt;html&gt;</span> <span class="sb"> &lt;body&gt;</span> <span class="sb"> &lt;h1&gt;Hello world!&lt;/h1&gt;</span> <span class="sb"> &lt;/body&gt;</span> <span class="sb">&lt;/html&gt;`</span><span class="p">;</span> <span class="kd">const</span><span class="w"> </span><span class="nx">NOT_FOUND</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="sb">`&lt;html&gt;</span> <span class="sb"> &lt;body&gt;</span> <span class="sb"> &lt;h1&gt;Not found&lt;/h1&gt;</span> <span class="sb"> &lt;/body&gt;</span> <span class="sb">&lt;/html&gt;`</span><span class="p">;</span> <span class="kd">class</span><span class="w"> </span><span class="nx">HTTPRequestHandler</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">constructor</span><span class="p">(</span><span class="nx">connection</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">connection</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">connection</span><span class="p">;</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">statusLine</span><span class="o">:</span><span class="w"> </span><span class="kc">null</span><span class="p">,</span> <span class="w"> </span><span class="nx">headers</span><span class="o">:</span><span class="w"> </span><span class="p">{},</span> <span class="w"> </span><span class="nx">body</span><span class="o">:</span><span class="w"> </span><span class="kc">null</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">parse</span><span class="p">(</span><span class="nx">buffer</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">lines</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">buffer</span><span class="p">.</span><span class="nx">toString</span><span class="p">().</span><span class="nx">split</span><span class="p">(</span><span class="nx">CRLF</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Parse/store status line if necessary</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">statusLine</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="p">[</span><span class="nx">method</span><span class="p">,</span><span class="w"> </span><span class="nx">path</span><span class="p">,</span><span class="w"> </span><span class="nx">protocol</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">lines</span><span class="p">.</span><span class="nx">shift</span><span class="p">().</span><span class="nx">split</span><span class="p">(</span><span class="s1">&#39; &#39;</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">statusLine</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">method</span><span class="p">,</span><span class="w"> </span><span class="nx">path</span><span class="p">,</span><span class="w"> </span><span class="nx">protocol</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Parse/store headers if the body hasn&#39;t begun</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">body</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">let</span><span class="w"> </span><span class="nx">line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">lines</span><span class="p">.</span><span class="nx">shift</span><span class="p">();</span><span class="w"> </span><span class="nx">lines</span><span class="p">.</span><span class="nx">length</span><span class="p">;</span><span class="w"> </span><span class="nx">line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">lines</span><span class="p">.</span><span class="nx">shift</span><span class="p">())</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Reached the end of headers, double CRLF</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">line</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="s1">&#39;&#39;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">body</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;&#39;</span><span class="p">;</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="p">[</span><span class="nx">key</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">line</span><span class="p">.</span><span class="nx">split</span><span class="p">(</span><span class="s1">&#39;:&#39;</span><span class="p">);</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">safeKey</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">key</span><span class="p">.</span><span class="nx">toLowerCase</span><span class="p">();</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">headers</span><span class="p">[</span><span class="nx">safeKey</span><span class="p">])</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">headers</span><span class="p">[</span><span class="nx">safeKey</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[];</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">headers</span><span class="p">[</span><span class="nx">safeKey</span><span class="p">].</span><span class="nx">push</span><span class="p">(</span><span class="nx">value</span><span class="p">.</span><span class="nx">trimStart</span><span class="p">());</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">body</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="nx">lines</span><span class="p">.</span><span class="nx">join</span><span class="p">(</span><span class="nx">CRLF</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">requestComplete</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">statusLine</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="o">!</span><span class="nb">Object</span><span class="p">.</span><span class="nx">keys</span><span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">headers</span><span class="p">).</span><span class="nx">length</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">body</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="kc">null</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">false</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="p">[</span><span class="nx">contentLength</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">headers</span><span class="p">[</span><span class="s1">&#39;content-length&#39;</span><span class="p">]</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="p">[];</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">statusLine</span><span class="p">.</span><span class="nx">method</span><span class="w"> </span><span class="o">!==</span><span class="w"> </span><span class="s1">&#39;GET&#39;</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">body</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">!==</span><span class="w"> </span><span class="nx">contentLength</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">false</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">true</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">sendResponse</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">response</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">status</span><span class="o">:</span><span class="w"> </span><span class="mf">200</span><span class="p">,</span><span class="w"> </span><span class="nx">statusMessage</span><span class="o">:</span><span class="w"> </span><span class="s1">&#39;OK&#39;</span><span class="p">,</span><span class="w"> </span><span class="nx">body</span><span class="o">:</span><span class="w"> </span><span class="s1">&#39;&#39;</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">statusLine</span><span class="p">.</span><span class="nx">path</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="s1">&#39;/hello-world.html&#39;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">response</span><span class="p">.</span><span class="nx">body</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">HELLO_WORLD</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">response</span><span class="p">.</span><span class="nx">status</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">404</span><span class="p">;</span> <span class="w"> </span><span class="nx">response</span><span class="p">.</span><span class="nx">statusMessage</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;NOT FOUND&#39;</span><span class="p">;</span> <span class="w"> </span><span class="nx">response</span><span class="p">.</span><span class="nx">body</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">NOT_FOUND</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">serialized</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;HTTP/1.1 ${response.status} ${response.statusMessage}&#39;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">CRLF</span><span class="w"> </span><span class="o">+</span> <span class="w"> </span><span class="s1">&#39;Content-Length: &#39;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">response</span><span class="p">.</span><span class="nx">body</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">CRLF</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">CRLF</span><span class="w"> </span><span class="o">+</span> <span class="w"> </span><span class="nx">response</span><span class="p">.</span><span class="nx">body</span><span class="p">;</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">connection</span><span class="p">.</span><span class="nx">write</span><span class="p">(</span><span class="nx">serialized</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">handle</span><span class="p">(</span><span class="nx">buffer</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">parse</span><span class="p">(</span><span class="nx">buffer</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="k">this</span><span class="p">.</span><span class="nx">requestComplete</span><span class="p">())</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">sendResponse</span><span class="p">();</span> <span class="w"> </span><span class="c1">// Other-wise the connection may attempt to be re-used, we don&#39;t support this.</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">connection</span><span class="p">.</span><span class="nx">end</span><span class="p">();</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> <span class="kd">function</span><span class="w"> </span><span class="nx">handleConnection</span><span class="p">(</span><span class="nx">connection</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">handler</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="nx">HTTPRequestHandler</span><span class="p">(</span><span class="nx">connection</span><span class="p">);</span> <span class="w"> </span><span class="nx">connection</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="s1">&#39;data&#39;</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="nx">buffer</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="nx">handler</span><span class="p">.</span><span class="nx">handle</span><span class="p">(</span><span class="nx">buffer</span><span class="p">));</span> <span class="p">}</span> <span class="kd">const</span><span class="w"> </span><span class="nx">server</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">net</span><span class="p">.</span><span class="nx">createServer</span><span class="p">(</span><span class="nx">handleConnection</span><span class="p">);</span> <span class="nx">server</span><span class="p">.</span><span class="nx">listen</span><span class="p">(</span><span class="s1">&#39;9000&#39;</span><span class="p">);</span> </pre></div> <p>So what's going on?</p> <h3 id="the-protocol">The protocol</h3><p>HTTP (version 1.1, specifically) is a convention for connecting over TCP/IP and sending plain-text messages between two processes. HTTP messages are broken into two categories: requests (the sender of a request is called a "client") and responses (the sender of a response is called a "server").</p> <p>HTTP is important because it is the default protocol of web browsers. When we type in <code>localhost:9000/hello-world.html</code> and hit enter, the browser will open an TCP/IP connection to the location <code>localhost</code> on the port <code>9000</code> and send an HTTP request. If/when it receives the HTTP response from the server it will try to render the response.</p> <h4 id="an-http-request">An HTTP request</h4><p>A bare minimum HTTP/1.1 request (<a href="https://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html">defined here</a>) based on the request for <code>localhost:9000/hello-world.html</code> is the following:</p> <div class="highlight"><pre><span></span>GET /hello-world.html HTTP/1.1\r\nHost: localhost:9000\r\n\r\n </pre></div> <p class="note"> The spec explicitly requires the <code>\r\n</code> combo to represent a newline instead of simply <code>\n</code>. </p><p>If we printed out this request it would look like this:</p> <div class="highlight"><pre><span></span>GET /hello-world.html HTTP/1.1 Host: localhost:9000 </pre></div> <h4 id="components-of-an-http-request">Components of an HTTP request</h4><p>An HTTP/1.1 request is made up of a few parts:</p> <ul> <li>[Mandatory]: The status line (the first line) followed by a CRLF (the <code>\r\n</code> combo)</li> <li>[Mandatory]: HTTP headers separated by a CRLF and followed by an additional CRLF</li> <li>[Optional]: The request body</li> </ul> <p>The status line consists of the request method (e.g. GET, POST, PUT, etc.), the path for the request, and the protocol -- all separated by a space.</p> <p>An HTTP header is a key-value pair separated by a colon. Spaces following the colon are ignored. The key is case insensitive. Only the <code>Host</code> header appears to be mandatory. Since these headers are sent by the client they are intended for the server's use.</p> <p>The request body is text and is only relevant for requests of certain methods (e.g. POST but not GET).</p> <h4 id="an-http-response">An HTTP response</h4><p>A bare minimum HTTP/1.1 response (<a href="https://www.w3.org/Protocols/rfc2616/rfc2616-sec6.html">defined here</a>) based on the file we wanted to send back is the following:</p> <div class="highlight"><pre><span></span>HTTP/1.1 200 OK\r\n\r\n&lt;html&gt;\n &lt;body&gt;\n &lt;h1&gt;Hello world!&lt;/h1&gt;\n &lt;/body&gt;\n&lt;/html&gt; </pre></div> <p>If we printed out this response it would look like this:</p> <div class="highlight"><pre><span></span>HTTP/1.1 200 OK &lt;html&gt; &lt;body&gt; &lt;h1&gt;Hello world!&lt;/h1&gt; &lt;/body&gt; &lt;/html&gt; </pre></div> <h4 id="components-of-an-http-response">Components of an HTTP response</h4><p>An HTTP/1.1 response is made up of a few parts:</p> <ul> <li>[Mandatory]: The status line (the first line) followed by a CRLF</li> <li>[Optional]: HTTP headers separated by a CRLF and followed by an additional CRLF</li> <li>[Optional]: The request body</li> </ul> <p>The status line consists of the protocol, the status code, and the status message -- all separated by a space.</p> <p>HTTP response headers are the same as HTTP request headers although in a response they are directives from the server to the client. There are many <a href="https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html">standard headers</a> that are used for such things as setting cache rules, setting cookies, settings response type (e.g. HTML vs CSS vs PNG so the browser knows how to handle the response).</p> <p>The response body is similar to the HTTP request body.</p> <h3 id="sockets">Sockets</h3><p>Most operating systems have a built-in means of connecting over TCP/IP (and sending and receiving messages) called "sockets". Sockets allow us to treat TCP/IP connections like files in memory. Most programming languages have a built-in socket library. Node.js provides a high-level interface for listening on a port and handling new connections.</p> <div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">handleConnection</span><span class="p">(</span><span class="nx">connection</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">connection</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="s1">&#39;data&#39;</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="nx">buffer</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="nx">doSomething</span><span class="o">???</span><span class="p">);</span> <span class="p">}</span> <span class="kd">const</span><span class="w"> </span><span class="nx">server</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">net</span><span class="p">.</span><span class="nx">createServer</span><span class="p">(</span><span class="nx">handleConnection</span><span class="p">);</span> <span class="nx">server</span><span class="p">.</span><span class="nx">listen</span><span class="p">(</span><span class="s1">&#39;9000&#39;</span><span class="p">);</span> </pre></div> <p>Once the program is listening, clients can open TCP/IP connections to the address (<code>localhost</code>) and port (<code>9000</code>) and our program takes over from there. Each connection is handled separately and receives "data" events. Each data event includes new bytes available for us to handle.</p> <p>Let's encapsulate the state of each connection in HTTPRequestHandler class. Its function will be to parse data as it becomes available and respond to the request when the request is done.</p> <div class="highlight"><pre><span></span><span class="kd">class</span><span class="w"> </span><span class="nx">HTTPRequestHandler</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">constructor</span><span class="p">(</span><span class="nx">connection</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">connection</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">connection</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">parse</span><span class="p">(</span><span class="nx">buffer</span><span class="p">)</span><span class="w"> </span><span class="p">{}</span> <span class="w"> </span><span class="nx">requestComplete</span><span class="p">()</span><span class="w"> </span><span class="p">{}</span> <span class="w"> </span><span class="nx">sendResponse</span><span class="p">()</span><span class="w"> </span><span class="p">{}</span> <span class="w"> </span><span class="nx">handle</span><span class="p">(</span><span class="nx">buffer</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">parse</span><span class="p">(</span><span class="nx">buffer</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="k">this</span><span class="p">.</span><span class="nx">requestComplete</span><span class="p">())</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">sendResponse</span><span class="p">();</span> <span class="w"> </span><span class="c1">// Other-wise the connection may attempt to be re-used, we don&#39;t support this.</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">connection</span><span class="p">.</span><span class="nx">end</span><span class="p">();</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> <span class="kd">function</span><span class="w"> </span><span class="nx">handleConnection</span><span class="p">(</span><span class="nx">connection</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">handler</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="nx">HTTPRequestHandler</span><span class="p">(</span><span class="nx">connection</span><span class="p">);</span> <span class="w"> </span><span class="nx">connection</span><span class="p">.</span><span class="nx">on</span><span class="p">(</span><span class="s1">&#39;data&#39;</span><span class="p">,</span><span class="w"> </span><span class="p">(</span><span class="nx">buffer</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="nx">handler</span><span class="p">.</span><span class="nx">handle</span><span class="p">(</span><span class="nx">buffer</span><span class="p">));</span> <span class="p">}</span> <span class="p">...</span> </pre></div> <p>There are three functions we need to implement now: <code>parse(buffer)</code>, <code>requestComplete()</code>, and <code>sendResponse</code>.</p> <h4 id="parse(buffer)">parse(buffer)</h4><p>This function will be responsible for progressively pulling out data from the buffer. If the status line has not been received, it will try to grab the status line. If the body has not yet started, it will accumulate headers. Then it will continue accumulating the body until we close the connection (this will happen implicitly when <code>requestComplete()</code> returns true).</p> <div class="highlight"><pre><span></span><span class="kd">class</span><span class="w"> </span><span class="nx">HTTPRequestHandler</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">constructor</span><span class="p">(</span><span class="nx">connection</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">connection</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">connection</span><span class="p">;</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">statusLine</span><span class="o">:</span><span class="w"> </span><span class="kc">null</span><span class="p">,</span> <span class="w"> </span><span class="nx">headers</span><span class="o">:</span><span class="w"> </span><span class="p">{},</span> <span class="w"> </span><span class="nx">body</span><span class="o">:</span><span class="w"> </span><span class="kc">null</span><span class="p">,</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">parse</span><span class="p">(</span><span class="nx">buffer</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">lines</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">buffer</span><span class="p">.</span><span class="nx">toString</span><span class="p">().</span><span class="nx">split</span><span class="p">(</span><span class="nx">CRLF</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Parse/store status line if necessary</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">statusLine</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="p">[</span><span class="nx">method</span><span class="p">,</span><span class="w"> </span><span class="nx">path</span><span class="p">,</span><span class="w"> </span><span class="nx">protocol</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">lines</span><span class="p">.</span><span class="nx">shift</span><span class="p">().</span><span class="nx">split</span><span class="p">(</span><span class="s1">&#39; &#39;</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">statusLine</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">method</span><span class="p">,</span><span class="w"> </span><span class="nx">path</span><span class="p">,</span><span class="w"> </span><span class="nx">protocol</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Parse/store headers if the body hasn&#39;t begun</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">body</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="kc">null</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">let</span><span class="w"> </span><span class="nx">line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">lines</span><span class="p">.</span><span class="nx">shift</span><span class="p">();</span><span class="w"> </span><span class="nx">lines</span><span class="p">.</span><span class="nx">length</span><span class="p">;</span><span class="w"> </span><span class="nx">line</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">lines</span><span class="p">.</span><span class="nx">shift</span><span class="p">())</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Reached the end of headers, double CRLF</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">line</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="s1">&#39;&#39;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">body</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;&#39;</span><span class="p">;</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="p">[</span><span class="nx">key</span><span class="p">,</span><span class="w"> </span><span class="nx">value</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">line</span><span class="p">.</span><span class="nx">split</span><span class="p">(</span><span class="s1">&#39;:&#39;</span><span class="p">);</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">safeKey</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">key</span><span class="p">.</span><span class="nx">toLowerCase</span><span class="p">();</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">headers</span><span class="p">[</span><span class="nx">safeKey</span><span class="p">])</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">headers</span><span class="p">[</span><span class="nx">safeKey</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[];</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">headers</span><span class="p">[</span><span class="nx">safeKey</span><span class="p">].</span><span class="nx">push</span><span class="p">(</span><span class="nx">value</span><span class="p">.</span><span class="nx">trimStart</span><span class="p">());</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">body</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="nx">lines</span><span class="p">.</span><span class="nx">join</span><span class="p">(</span><span class="nx">CRLF</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="p">...</span> <span class="p">}</span> </pre></div> <h4 id="requestcomplete()">requestComplete()</h4><p>This function will look at the internal request state and return false if the status line has not been received, no headers have been received (although this is stricter than the HTTP/1.1 standard requires), or if the body length is not equal to the value of the <code>Content-Length</code> header.</p> <div class="highlight"><pre><span></span><span class="kd">class</span><span class="w"> </span><span class="nx">HTTPRequestHandler</span><span class="w"> </span><span class="p">{</span> <span class="p">...</span> <span class="w"> </span><span class="nx">requestComplete</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">statusLine</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="o">!</span><span class="nb">Object</span><span class="p">.</span><span class="nx">keys</span><span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">headers</span><span class="p">).</span><span class="nx">length</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">body</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="kc">null</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">false</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="p">[</span><span class="nx">contentLength</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">headers</span><span class="p">[</span><span class="s1">&#39;content-length&#39;</span><span class="p">]</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="p">[];</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">statusLine</span><span class="p">.</span><span class="nx">method</span><span class="w"> </span><span class="o">!==</span><span class="w"> </span><span class="s1">&#39;GET&#39;</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">request</span><span class="p">.</span><span class="nx">body</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">!==</span><span class="w"> </span><span class="nx">contentLength</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">false</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="kc">true</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="p">...</span> <span class="p">}</span> </pre></div> <h4 id="sendresponse()">sendResponse()</h4><p>Finally we'll hard-code two responses (one for the valid request for /hello-world.html and a catch-all 404 response for every other request). These responses need to be serialized according the HTTP response format described above and written to the connection.</p> <div class="highlight"><pre><span></span><span class="k">class</span><span class="w"> </span><span class="n">HTTPRequestHandler</span><span class="w"> </span><span class="p">{</span> <span class="o">...</span> <span class="w"> </span><span class="n">sendResponse</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">response</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">status</span><span class="p">:</span><span class="w"> </span><span class="mi">200</span><span class="p">,</span><span class="w"> </span><span class="n">statusMessage</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;OK&#39;</span><span class="p">,</span><span class="w"> </span><span class="n">body</span><span class="p">:</span><span class="w"> </span><span class="s1">&#39;&#39;</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">this</span><span class="o">.</span><span class="n">request</span><span class="o">.</span><span class="n">statusLine</span><span class="o">.</span><span class="n">path</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="s1">&#39;/hello-world.html&#39;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">response</span><span class="o">.</span><span class="n">body</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">HELLO_WORLD</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">response</span><span class="o">.</span><span class="n">status</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">404</span><span class="p">;</span> <span class="w"> </span><span class="n">response</span><span class="o">.</span><span class="n">statusMessage</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;NOT FOUND&#39;</span><span class="p">;</span> <span class="w"> </span><span class="n">response</span><span class="o">.</span><span class="n">body</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">NOT_FOUND</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">const</span><span class="w"> </span><span class="n">serialized</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;HTTP/1.1 ${response.status} ${response.statusMessage}&#39;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">CRLF</span><span class="w"> </span><span class="o">+</span> <span class="w"> </span><span class="s1">&#39;Content-Length: &#39;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">response</span><span class="o">.</span><span class="n">body</span><span class="o">.</span><span class="n">length</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">CRLF</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">CRLF</span><span class="w"> </span><span class="o">+</span> <span class="w"> </span><span class="n">response</span><span class="o">.</span><span class="n">body</span><span class="p">;</span> <span class="w"> </span><span class="n">this</span><span class="o">.</span><span class="n">connection</span><span class="o">.</span><span class="n">write</span><span class="p">(</span><span class="n">serialized</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span> <span class="o">...</span> <span class="p">}</span> </pre></div> <h3 id="run-it">Run it</h3><p>Now that we've got all the pieces we can finally run the initial program:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>node<span class="w"> </span>uweb.js<span class="w"> </span><span class="p">&amp;</span> $<span class="w"> </span>open<span class="w"> </span>localhost:9000/hello-world.html </pre></div> <p>And we see the page! Try any other path and we receive a 404.</p> <h3 id="review-and-next-steps">Review and next steps</h3><p>We covered the basics of HTTP/1.1: a very simple, plain-text protocol oriented around requests and responses over a TCP/IP connection. We realize we need to know little about anything but parsing and formatting text on top of the TCP/IP blackbox called sockets. We created a simple application that returns different responses based on the request. But we're a far shot from a more general library, a web framework. Future posts will explore this transition as well as performance and more features.</p> <p><a href="https://github.com/eatonphil/uweb">Code is available on Github</a>.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">First post in a new series on web server basics starting with HTTP and sockets (using JavaScript/Node.js). <a href="https://t.co/uBiNfOBJeZ">https://t.co/uBiNfOBJeZ</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1114988522702823424?ref_src=twsrc%5Etfw">April 7, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/web-server-basics-http-and-sockets.htmlSat, 06 Apr 2019 00:00:00 +0000Writing a simple JSON path parserhttp://notes.eatonphil.com/writing-a-simple-json-path-parser.html<p>Let's say we want to implement a simple list filtering language so we can enter <code>a.b = 12</code> and return only results in a list where the <code>a</code> column is an object that contains a field <code>b</code> that is set to the value 12. What would a <code>filter(jsonPath, equals, listOfObjects)</code> function look like?</p> <p>If we only needed to support object lookup, we might implement <code>filter</code> by splitting the <code>jsonPath</code> on periods and look at each object in the <code>listOfObjects</code> for matching values. It might look something like this:</p> <div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">filter</span><span class="p">(</span><span class="nx">jsonPath</span><span class="p">,</span><span class="w"> </span><span class="nx">equals</span><span class="p">,</span><span class="w"> </span><span class="nx">listOfObjects</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">parts</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">jsonPath</span><span class="p">.</span><span class="nx">split</span><span class="p">(</span><span class="s1">&#39;.&#39;</span><span class="p">);</span> <span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">filterSingle</span><span class="p">(</span><span class="nx">object</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">objectAtPath</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">object</span><span class="p">;</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">0</span><span class="p">;</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">let</span><span class="w"> </span><span class="nx">part</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">parts</span><span class="p">[</span><span class="nx">i</span><span class="p">];</span><span class="w"> </span><span class="nx">part</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">objectAtPath</span><span class="p">;</span><span class="w"> </span><span class="nx">part</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">parts</span><span class="p">[</span><span class="o">++</span><span class="nx">i</span><span class="p">])</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">objectAtPath</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">objectAtPath</span><span class="p">[</span><span class="nx">part</span><span class="p">];</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="nx">parts</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">objectAtPath</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="nx">equals</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">listOfObjects</span><span class="p">.</span><span class="nx">filter</span><span class="p">(</span><span class="nx">filterSingle</span><span class="p">);</span> <span class="p">}</span> <span class="nx">require</span><span class="p">(</span><span class="s1">&#39;assert&#39;</span><span class="p">).</span><span class="nx">deepEqual</span><span class="p">(</span> <span class="w"> </span><span class="nx">filter</span><span class="p">(</span><span class="s1">&#39;foo.bar&#39;</span><span class="p">,</span><span class="w"> </span><span class="mf">12</span><span class="p">,</span><span class="w"> </span><span class="p">[{</span><span class="w"> </span><span class="nx">foo</span><span class="o">:</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">bar</span><span class="o">:</span><span class="w"> </span><span class="mf">12</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">foo</span><span class="o">:</span><span class="w"> </span><span class="kc">null</span><span class="w"> </span><span class="p">}]),</span> <span class="w"> </span><span class="p">[{</span><span class="w"> </span><span class="nx">foo</span><span class="o">:</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">bar</span><span class="o">:</span><span class="w"> </span><span class="mf">12</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">}],</span> <span class="p">);</span> </pre></div> <p>That doesn't work too badly. We haven't handled edge cases like a <code>jsonPath</code> of <code>foo..bar</code> or <code>bar.</code>. But those would not be difficult to handle:</p> <div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">filter</span><span class="p">(</span><span class="nx">jsonPath</span><span class="p">,</span><span class="w"> </span><span class="nx">equals</span><span class="p">,</span><span class="w"> </span><span class="nx">listOfObjects</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">jsonPath</span><span class="p">.</span><span class="nx">charAt</span><span class="p">(</span><span class="mf">0</span><span class="p">)</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="s1">&#39;.&#39;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">&#39;JSON path cannot begin with a dot, in: &#39;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">jsonPath</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">jsonPath</span><span class="p">.</span><span class="nx">charAt</span><span class="p">(</span><span class="nx">jsonPath</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="p">)</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="s1">&#39;.&#39;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">&#39;JSON path cannot end with a dot, in: &#39;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">jsonPath</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">parts</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">jsonPath</span><span class="p">.</span><span class="nx">split</span><span class="p">(</span><span class="s1">&#39;.&#39;</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">parts</span><span class="p">.</span><span class="nx">reduce</span><span class="p">((</span><span class="nx">hasEmptyPart</span><span class="p">,</span><span class="w"> </span><span class="nx">part</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="nx">hasEmptyPart</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">part</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="kc">false</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">&#39;JSON path cannot contain an empty section, in: &#39;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">jsonPath</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">filterSingle</span><span class="p">(</span><span class="nx">object</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">objectAtPath</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">object</span><span class="p">;</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">0</span><span class="p">;</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">let</span><span class="w"> </span><span class="nx">part</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">parts</span><span class="p">[</span><span class="nx">i</span><span class="p">];</span><span class="w"> </span><span class="nx">part</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">objectAtPath</span><span class="p">;</span><span class="w"> </span><span class="nx">part</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">parts</span><span class="p">[</span><span class="o">++</span><span class="nx">i</span><span class="p">])</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">objectAtPath</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">objectAtPath</span><span class="p">[</span><span class="nx">part</span><span class="p">];</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="nx">parts</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">objectAtPath</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="nx">equals</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">listOfObjects</span><span class="p">.</span><span class="nx">filter</span><span class="p">(</span><span class="nx">filterSingle</span><span class="p">);</span> <span class="p">}</span> </pre></div> <p>And we now handle the most obvious invalid path cases.</p> <h3 id="arrays?">Arrays?</h3><p>If we want to support array path syntax, things get harder. For example:</p> <div class="highlight"><pre><span></span><span class="nx">require</span><span class="p">(</span><span class="s1">&#39;assert&#39;</span><span class="p">).</span><span class="nx">deepEqual</span><span class="p">(</span> <span class="w"> </span><span class="nx">filter</span><span class="p">(</span><span class="s1">&#39;foo.bar[0].biz&#39;</span><span class="p">,</span><span class="w"> </span><span class="mf">14</span><span class="p">,</span><span class="w"> </span><span class="p">[{</span><span class="w"> </span><span class="nx">foo</span><span class="o">:</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">bar</span><span class="o">:</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">biz</span><span class="o">:</span><span class="w"> </span><span class="mf">14</span><span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">biz</span><span class="o">:</span><span class="w"> </span><span class="mf">19</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">]</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">foo</span><span class="o">:</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">bar</span><span class="o">:</span><span class="w"> </span><span class="kc">null</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">}]),</span> <span class="w"> </span><span class="p">[{</span><span class="w"> </span><span class="nx">foo</span><span class="o">:</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">bar</span><span class="o">:</span><span class="w"> </span><span class="p">[</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">biz</span><span class="o">:</span><span class="w"> </span><span class="mf">14</span><span class="w"> </span><span class="p">},</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">biz</span><span class="o">:</span><span class="w"> </span><span class="mf">19</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">]</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="p">}],</span> <span class="p">);</span> </pre></div> <p>We could try to stick with the hammer that is <code>String.prototype.split</code> and write some really messy code. :) Or we could switch to an approach that gives us more control. Let's do that.</p> <p>We'll build a very simple lexer that will iterate over each character accumulating characters into individual tokens that represent the pieces of the path. Let's start by supporting the original <code>jsonPath</code> syntax and error-handling.</p> <div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">getJsonPathParts</span><span class="p">(</span><span class="nx">path</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">parts</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[];</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">currentToken</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;&#39;</span><span class="p">;</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">let</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="nx">path</span><span class="p">.</span><span class="nx">length</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">path</span><span class="p">[</span><span class="nx">i</span><span class="p">];</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="nx">c</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">&#39;.&#39;</span><span class="o">:</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="nx">currentToken</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">&#39;JSON path cannot contain empty section, in: &#39;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">path</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">parts</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">currentToken</span><span class="p">);</span> <span class="w"> </span><span class="nx">currentToken</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;&#39;</span><span class="p">;</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">default</span><span class="o">:</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">currentToken</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="nx">c</span><span class="p">;</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="nx">currentToken</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">&#39;JSON path cannot end with dot, in: &#39;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">path</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">parts</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">currentToken</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">parts</span><span class="p">;</span> <span class="p">}</span> <span class="kd">function</span><span class="w"> </span><span class="nx">filter</span><span class="p">(</span><span class="nx">jsonPath</span><span class="p">,</span><span class="w"> </span><span class="nx">equals</span><span class="p">,</span><span class="w"> </span><span class="nx">listOfObjects</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">parts</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">getJsonPathParts</span><span class="p">(</span><span class="nx">jsonPath</span><span class="p">);</span> <span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">filterSingle</span><span class="p">(</span><span class="nx">object</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">objectAtPath</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">object</span><span class="p">;</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">0</span><span class="p">;</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">let</span><span class="w"> </span><span class="nx">part</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">parts</span><span class="p">[</span><span class="nx">i</span><span class="p">];</span><span class="w"> </span><span class="nx">part</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">objectAtPath</span><span class="p">;</span><span class="w"> </span><span class="nx">part</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">parts</span><span class="p">[</span><span class="o">++</span><span class="nx">i</span><span class="p">])</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">objectAtPath</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">objectAtPath</span><span class="p">[</span><span class="nx">part</span><span class="p">];</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="nx">parts</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">objectAtPath</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="nx">equals</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">listOfObjects</span><span class="p">.</span><span class="nx">filter</span><span class="p">(</span><span class="nx">filterSingle</span><span class="p">);</span> <span class="p">}</span> </pre></div> <p>Not too bad!</p> <h3 id="arrays?">Arrays?</h3><p>Right. Let's build on <code>getJsonPathParts</code> to support array syntax. Along with that we're going to impose some restrictions. The object path parts must be only alphanumeric characters plus dashes and underscores. The array index must only be numeric characters. Anything else should throw an error.</p> <div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">getJsonPathParts</span><span class="p">(</span><span class="nx">path</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">parts</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[];</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">currentToken</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;&#39;</span><span class="p">;</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">inArray</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">false</span><span class="p">;</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">let</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="nx">path</span><span class="p">.</span><span class="nx">length</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">path</span><span class="p">[</span><span class="nx">i</span><span class="p">];</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="nx">c</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">&#39;.&#39;</span><span class="o">:</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">currentToken</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="s1">&#39;&#39;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">&#39;JSON path cannot contain empty section, in: &#39;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">path</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">parts</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">currentToken</span><span class="p">);</span> <span class="w"> </span><span class="nx">currentToken</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;&#39;</span><span class="p">;</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">&#39;[&#39;</span><span class="o">:</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">inArray</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">&#39;JSON path contains unexpected left bracket, in: &#39;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">path</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">currentToken</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="s1">&#39;&#39;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">&#39;JSON path cannot contain empty section, in: &#39;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">path</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">parts</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">currentToken</span><span class="p">);</span> <span class="w"> </span><span class="nx">currentToken</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;&#39;</span><span class="p">;</span> <span class="w"> </span><span class="nx">inArray</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">true</span><span class="p">;</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">&#39;]&#39;</span><span class="o">:</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="o">!</span><span class="nx">inArray</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">&#39;JSON path contains unexpected right bracket, in: &#39;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">path</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">currentToken</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="s1">&#39;&#39;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">&#39;JSON path array index must not be empty, in: &#39;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">path</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Array indices are recorded as numbers, not strings.</span> <span class="w"> </span><span class="nx">currentToken</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">parseInt</span><span class="p">(</span><span class="nx">currentToken</span><span class="p">,</span><span class="w"> </span><span class="mf">10</span><span class="p">);</span> <span class="w"> </span><span class="nx">inArray</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">false</span><span class="p">;</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">default</span><span class="o">:</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">code</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">charCodeAt</span><span class="p">(</span><span class="mf">0</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">inArray</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">code</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="s1">&#39;0&#39;</span><span class="p">.</span><span class="nx">charCodeAt</span><span class="p">(</span><span class="mf">0</span><span class="p">)</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">code</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="s1">&#39;9&#39;</span><span class="p">.</span><span class="nx">charCodeAt</span><span class="p">(</span><span class="mf">0</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">currentToken</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="nx">c</span><span class="p">;</span> <span class="w"> </span><span class="k">continue</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">&#39;JSON path array index must be numeric, in: &#39;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">path</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">((</span><span class="nx">code</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="s1">&#39;A&#39;</span><span class="p">.</span><span class="nx">charCodeAt</span><span class="p">(</span><span class="mf">0</span><span class="p">)</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">code</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="s1">&#39;z&#39;</span><span class="p">.</span><span class="nx">charCodeAt</span><span class="p">(</span><span class="mf">0</span><span class="p">))</span><span class="w"> </span><span class="o">||</span> <span class="w"> </span><span class="p">(</span><span class="nx">code</span><span class="w"> </span><span class="o">&gt;=</span><span class="w"> </span><span class="s1">&#39;0&#39;</span><span class="p">.</span><span class="nx">charCodeAt</span><span class="p">(</span><span class="mf">0</span><span class="p">)</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">code</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="s1">&#39;9&#39;</span><span class="p">.</span><span class="nx">charCodeAt</span><span class="p">(</span><span class="mf">0</span><span class="p">))</span><span class="w"> </span><span class="o">||</span> <span class="w"> </span><span class="p">[</span><span class="s1">&#39;-&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;_&#39;</span><span class="p">].</span><span class="nx">includes</span><span class="p">(</span><span class="nx">c</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">currentToken</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="nx">c</span><span class="p">;</span> <span class="w"> </span><span class="k">continue</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">&#39;JSON path part must contain only alphanumeric characters, in: &#39;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">path</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">currentToken</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="s1">&#39;&#39;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">&#39;JSON path cannot end with dot, in: &#39;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">path</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">parts</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">currentToken</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">parts</span><span class="p">;</span> <span class="p">}</span> <span class="nx">require</span><span class="p">(</span><span class="s1">&#39;assert&#39;</span><span class="p">).</span><span class="nx">deepEqual</span><span class="p">(</span><span class="nx">getJsonPathParts</span><span class="p">(</span><span class="s1">&#39;foo.bar[0].biz&#39;</span><span class="p">),</span><span class="w"> </span><span class="p">[</span><span class="s1">&#39;foo&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;bar&#39;</span><span class="p">,</span><span class="w"> </span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;biz&#39;</span><span class="p">]);</span> </pre></div> <p>Now we've got a simple JSON path parser with decent error handling! Of course we wouldn't want to use this little library in production until we had some serious test coverage. But writing tests and calling out my mistakes will be left here as an exercise for the reader. :)</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">New (short) post on parsing JSON paths in JavaScript <a href="https://t.co/mIjOMugA7C">https://t.co/mIjOMugA7C</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1111262461074784256?ref_src=twsrc%5Etfw">March 28, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/writing-a-simple-json-path-parser.htmlWed, 27 Mar 2019 00:00:00 +0000Writing a lisp compiler from scratch in JavaScript: 3. LLVMhttp://notes.eatonphil.com/compiler-basics-llvm.html<p class="note"> Previously in compiler basics: <! forgive me, for I have sinned > <br /> <a href="/compiler-basics-lisp-to-assembly.html">1. lisp to assembly</a> <br /> <a href="/compiler-basics-functions.html">2. user-defined functions and variables</a> <br /> <br/> Next in compiler basics: <br /> <a href="/compiler-basics-llvm-conditionals.html">4. LLVM conditionals and compiling fibonacci</a> <br /> <a href="/compiler-basics-llvm-system-calls.html">5. LLVM system calls</a> <br /> <a href="/compiler-basics-an-x86-upgrade.html">6. an x86 upgrade</a> </p><p>In this post we'll extend the <a href="https://github.com/eatonphil/ulisp">compiler</a> to emit <a href="https://llvm.org/docs/LangRef.html">LLVM IR</a> as an option instead of x86 assembly.</p> <p><a href="https://github.com/eatonphil/ulisp">All source code is available on Github</a>.</p> <p>LLVM IR is a portable, human-readable, typed, assembly-like syntax that LLVM can apply <a href="https://llvm.org/docs/Passes.html">optimizations</a> on before generating assembly for the target architecture. Many language implementors choose to compile to LLVM IR specifically to avoid needing to implement sophisticated optimizations.</p> <p>But the biggest reason I'm adding an LLVM backend is so that I can punt on implementing <a href="https://en.wikipedia.org/wiki/Register_allocation">register allocation</a>. This is the technique that allows you to generically use as many registers as possible before storing local variables on the stack. While register allocation algorithms are not <em>that</em> difficult, I got bored/lazy trying to implement this for ulisp. And LLVM IR provides "infinite" locals that get mapped as needed to registers and the stack -- implementing register allocation.</p> <h3 id="llvm-ir-basics">LLVM IR basics</h3><p>In LLVM IR, all local variables must be prefixed with <code>%</code>. All global variables (including function names) must be prefixed with <code>&#64;</code>. LLVM IR must be in <a href="https://www.cs.cmu.edu/~fp/courses/15411-f08/lectures/09-ssa.pdf">single-static assignment</a> (SSA) form, which means that no variable is assigned twice. Additionally, literals cannot be assigned to variables directly. So we'll work around that by adding 0 to the literal. Furthermore, we'll take advantage of the <code>add</code>, <code>sub</code>, and <code>mul</code> operations built into LLVM IR.</p> <div class="highlight"><pre><span></span><span class="c">; x = 4</span> <span class="nv">%x</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="m">4</span><span class="p">,</span><span class="w"> </span><span class="m">0</span> </pre></div> <p>The type that the operation is operating on must be specified after the operation name. In this case we are specifying that <code>add</code> is operating on and returning 32-bit integers.</p> <p>While this might seem very inefficient, we'll see in the end that LLVM easily optimizes this away.</p> <h4 id="function-definition">Function definition</h4><p>Functions are defined at the top-level and are much simpler than x86 assembly since the details of calling conventions are handled by LLVM.</p> <div class="highlight"><pre><span></span><span class="c">; (def plus (a b) (+ a b))</span> <span class="k">define</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="vg">@plus</span><span class="w"> </span><span class="p">(</span><span class="kt">i32</span><span class="w"> </span><span class="err">a</span><span class="p">,</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="err">b</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nv">%res</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="err">a</span><span class="p">,</span><span class="w"> </span><span class="err">b</span> <span class="w"> </span><span class="k">ret</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="nv">%res</span> <span class="p">}</span> </pre></div> <p>In ulisp, all functions will return a result (and the only supported type for now are 32-bit integers). So we annotate the definition with this return type (<code>i32</code> in <code>define i32</code>). Finally, we return inside the function with the <code>ret</code> instruction that must also specify the type (again <code>i32</code>).</p> <h4 id="generating-llvm-ir">Generating LLVM IR</h4><p>We are going to generate LLVM IR as text. But any large project will benefit from generating LLVM IR via <a href="http://llvm.org/docs/ProgrammersManual.html">API</a>.</p> <h3 id="supporting-multiple-backends">Supporting multiple backends</h3><p>The goal is to be able to switch at compile-time between generating x86 assembly or generating LLVM IR. So we'll need to reorganize ulisp a little bit.</p> <p>We'll edit <code>src/ulisp.js</code> to accept a second argument to specify the backend (and from now on we'll default to LLVM).</p> <div class="highlight"><pre><span></span><span class="kd">const</span><span class="w"> </span><span class="nx">cp</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">&#39;child_process&#39;</span><span class="p">);</span> <span class="kd">const</span><span class="w"> </span><span class="nx">fs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">&#39;fs&#39;</span><span class="p">);</span> <span class="kd">const</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">parse</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">&#39;./parser&#39;</span><span class="p">);</span> <span class="kd">const</span><span class="w"> </span><span class="nx">backends</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">&#39;./backend&#39;</span><span class="p">);</span> <span class="kd">function</span><span class="w"> </span><span class="nx">main</span><span class="p">(</span><span class="nx">args</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">input</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">fs</span><span class="p">.</span><span class="nx">readFileSync</span><span class="p">(</span><span class="nx">args</span><span class="p">[</span><span class="mf">2</span><span class="p">]).</span><span class="nx">toString</span><span class="p">();</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">backend</span><span class="p">;</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="nx">args</span><span class="p">[</span><span class="mf">3</span><span class="p">])</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">&#39;llvm&#39;</span><span class="o">:</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="kc">undefined</span><span class="o">:</span> <span class="w"> </span><span class="nx">backend</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">backends</span><span class="p">.</span><span class="nx">llvm</span><span class="p">;</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">&#39;x86&#39;</span><span class="o">:</span> <span class="w"> </span><span class="nx">backend</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">backends</span><span class="p">.</span><span class="nx">x86</span><span class="p">;</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="k">default</span><span class="o">:</span> <span class="w"> </span><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="s1">&#39;Unsupported backend &#39;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">args</span><span class="p">[</span><span class="mf">3</span><span class="p">]);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="p">[</span><span class="nx">ast</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">parse</span><span class="p">(</span><span class="nx">input</span><span class="p">);</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">program</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">backend</span><span class="p">.</span><span class="nx">compile</span><span class="p">(</span><span class="nx">ast</span><span class="p">);</span> <span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">fs</span><span class="p">.</span><span class="nx">mkdirSync</span><span class="p">(</span><span class="s1">&#39;build&#39;</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="p">(</span><span class="nx">e</span><span class="p">)</span><span class="w"> </span><span class="p">{}</span> <span class="w"> </span><span class="nx">backend</span><span class="p">.</span><span class="nx">build</span><span class="p">(</span><span class="s1">&#39;build&#39;</span><span class="p">,</span><span class="w"> </span><span class="nx">program</span><span class="p">);</span> <span class="p">}</span> <span class="nx">main</span><span class="p">(</span><span class="nx">process</span><span class="p">.</span><span class="nx">argv</span><span class="p">);</span> </pre></div> <h3 id="the-llvm-backend">The LLVM backend</h3><p>We'll add <code>src/backend/llvm.js</code> and expose <code>compile</code> and <code>build</code> functions.</p> <h4 id="compile(ast)">compile(ast)</h4><p>This will work the same as it did for the x86 backend, creating a new <code>Compiler</code> helper object, creating a scope manager (which we'll get into in more detail shortly), and generating code from the AST wrapped in a <code>begin</code>.</p> <div class="highlight"><pre><span></span><span class="nx">module</span><span class="p">.</span><span class="nx">exports</span><span class="p">.</span><span class="nx">compile</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kd">function</span><span class="p">(</span><span class="nx">ast</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="nx">Compiler</span><span class="p">();</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">scope</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="nx">Scope</span><span class="p">();</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">compileBegin</span><span class="p">(</span><span class="nx">ast</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">symbol</span><span class="p">(),</span><span class="w"> </span><span class="nx">scope</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">getOutput</span><span class="p">();</span> <span class="p">};</span> </pre></div> <h4 id="build(builddir,-output)">build(buildDir, output)</h4><p>The job of <code>build</code> will be to clean up the build directory, write any output as needed to the directory, and compile the written output. Since we're dealing with LLVM IR, we first call <a href="https://llvm.org/docs/CommandGuide/llc.html">llc</a> on the IR file to get an assembly file. Then we can call <code>gcc</code> on the assembly to get a binary output.</p> <div class="highlight"><pre><span></span><span class="kd">const</span><span class="w"> </span><span class="nx">cp</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">&#39;child_process&#39;</span><span class="p">);</span> <span class="kd">const</span><span class="w"> </span><span class="nx">fs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">&#39;fs&#39;</span><span class="p">);</span> <span class="p">...</span> <span class="nx">module</span><span class="p">.</span><span class="nx">exports</span><span class="p">.</span><span class="nx">build</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kd">function</span><span class="p">(</span><span class="nx">buildDir</span><span class="p">,</span><span class="w"> </span><span class="nx">program</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">prog</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;prog&#39;</span><span class="p">;</span> <span class="w"> </span><span class="nx">fs</span><span class="p">.</span><span class="nx">writeFileSync</span><span class="p">(</span><span class="nx">buildDir</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="sb">`/</span><span class="si">${</span><span class="nx">prog</span><span class="si">}</span><span class="sb">.ll`</span><span class="p">,</span><span class="w"> </span><span class="nx">program</span><span class="p">);</span> <span class="w"> </span><span class="nx">cp</span><span class="p">.</span><span class="nx">execSync</span><span class="p">(</span><span class="sb">`llc -o </span><span class="si">${</span><span class="nx">buildDir</span><span class="si">}</span><span class="sb">/</span><span class="si">${</span><span class="nx">prog</span><span class="si">}</span><span class="sb">.s </span><span class="si">${</span><span class="nx">buildDir</span><span class="si">}</span><span class="sb">/</span><span class="si">${</span><span class="nx">prog</span><span class="si">}</span><span class="sb">.ll`</span><span class="p">);</span> <span class="w"> </span><span class="nx">cp</span><span class="p">.</span><span class="nx">execSync</span><span class="p">(</span><span class="sb">`gcc -o </span><span class="si">${</span><span class="nx">buildDir</span><span class="si">}</span><span class="sb">/</span><span class="si">${</span><span class="nx">prog</span><span class="si">}</span><span class="sb"> </span><span class="si">${</span><span class="nx">buildDir</span><span class="si">}</span><span class="sb">/</span><span class="si">${</span><span class="nx">prog</span><span class="si">}</span><span class="sb">.s`</span><span class="p">);</span> <span class="p">};</span> </pre></div> <h3 id="taking-advantage-of-locals">Taking advantage of locals</h3><p>Before we get too far into the specifics of LLVM IR code generation, let's build out the infrastructure to take advantage of "infinite" locals. In particular, we want a local-manager (<code>Scope</code>) with four functions:</p> <ul> <li><code>register(local: name)</code>: for tracking user variables and mapping to safe names</li> <li><code>symbol()</code>: for tracking internal temporary variables</li> <li><code>get(local: name)</code>: for returning the safe name of a user variable</li> <li><code>copy()</code>: for duplicating the local-tracker when we enter a new scope</li> </ul> <p>It is important to track and map user variables into safe names so we don't accidentally conflict between variable names used by the user and names used by the compiler itself.</p> <h4 id="register(local)">register(local)</h4><p>When we register, we'll want to replace any unsafe characters that Lisp allows but LLVM likely won't. For now, we'll just replace any dashes in the name (since dashes are fine in variables in Lisp) with underscores. Then we'll add a number to the end of the local name until we have a safe name that doesn't exist already. Finally we return that safe name after storing the mapping.</p> <div class="highlight"><pre><span></span><span class="kd">class</span><span class="w"> </span><span class="nx">Scope</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">constructor</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">locals</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">register</span><span class="p">(</span><span class="nx">local</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">copy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">local</span><span class="p">.</span><span class="nx">replace</span><span class="p">(</span><span class="s1">&#39;-&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;_&#39;</span><span class="p">);</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">1</span><span class="p">;</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">locals</span><span class="p">[</span><span class="nx">copy</span><span class="p">])</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">copy</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">local</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">n</span><span class="o">++</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">locals</span><span class="p">[</span><span class="nx">local</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">copy</span><span class="p">;</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">copy</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <h4 id="symbol()">symbol()</h4><p>This is a simple function that will return one new unused safe name that we can store things in.</p> <div class="highlight"><pre><span></span><span class="kd">class</span><span class="w"> </span><span class="nx">Scope</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">...</span> <span class="w"> </span><span class="nx">symbol</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">nth</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">Object</span><span class="p">.</span><span class="nx">keys</span><span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">locals</span><span class="p">).</span><span class="nx">length</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mf">1</span><span class="p">;</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">register</span><span class="p">(</span><span class="s1">&#39;sym&#39;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">nth</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">...</span> <span class="p">}</span> </pre></div> <p>We start off by making up a name based on the prefix <code>sym</code> and a suffix of the current key length and pass that into the <code>register</code> function to make sure we get a safe name.</p> <h4 id="get(local)">get(local)</h4><p>This function is a very simple lookup to return the safe name for a user variable. It is up to the caller of this function to handle if the user variable does not exist in scope (and perhaps throw a compiler error back to the programmer).</p> <div class="highlight"><pre><span></span><span class="kd">class</span><span class="w"> </span><span class="nx">Scope</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">...</span> <span class="w"> </span><span class="nx">get</span><span class="p">(</span><span class="nx">local</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">locals</span><span class="p">[</span><span class="nx">local</span><span class="p">];</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">...</span> <span class="p">}</span> </pre></div> <h4 id="copy()">copy()</h4><p>Finally, we want to expose a copy function so we can duplicate the local storage before entering a new scope. (A variable inside a function should not exist in scope outside the function.)</p> <div class="highlight"><pre><span></span><span class="kd">class</span><span class="w"> </span><span class="nx">Scope</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">...</span> <span class="w"> </span><span class="nx">copy</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">c</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="nx">Scope</span><span class="p">();</span> <span class="w"> </span><span class="nx">c</span><span class="p">.</span><span class="nx">locals</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="p">...</span><span class="k">this</span><span class="p">.</span><span class="nx">locals</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">c</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">...</span> <span class="p">}</span> </pre></div> <h3 id="back-to-codegen!">Back to codegen!</h3><p>As mentioned in <code>module.exports.compile</code>, we're going to use a <code>Compiler</code> that exposes a number of compiler helpers:</p> <ul> <li><code>emit(depth, code)</code>: an internal helper for outputting indented lines of code</li> <li><code>compileBegin(ast, destination, scope)</code>: compiles a begin block</li> <li><code>compileExpression(ast, destination, scope)</code>: compiles variable references, literals, and passes on function calls</li> <li><code>compileCall(functionName, ast, destination, scope)</code>: compiles a function call</li> <li><code>compileDefine([functionName, parameters, ...body], destination, scope)</code>: compiles a function definition</li> <li><code>compileOp(op)</code>: helper function for generating code for primitive operations like <code>add</code></li> <li><code>getOutput()</code>: returns the code generated by the compiler</li> </ul> <h4 id="emit(depth,-code)">emit(depth, code)</h4><p>Like we had in the x86 backend, this will indent the code two spaces <code>depth</code> times and write it to the buffer we track generated code.</p> <div class="highlight"><pre><span></span><span class="kd">class</span><span class="w"> </span><span class="nx">Compiler</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">constructor</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">outBuffer</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[];</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="nx">code</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">indent</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="nb">Array</span><span class="p">(</span><span class="nx">depth</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mf">1</span><span class="p">).</span><span class="nx">join</span><span class="p">(</span><span class="s1">&#39; &#39;</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">outBuffer</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">indent</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">code</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <h4 id="compilebegin(ast,-destination,-scope)">compileBegin(ast, destination, scope)</h4><p>Our first compiler function actually does no code generation itself. We'll call <code>compileExpression</code> on each item within the begin block. And we'll pass the <code>destination</code> to the last expression in the list so that the value of a begin block is set to the value of its last expression. All other expressions will receive a temporary variable to store results.</p> <div class="highlight"><pre><span></span><span class="kd">class</span><span class="w"> </span><span class="nx">Compiler</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">...</span> <span class="w"> </span><span class="nx">compileBegin</span><span class="p">(</span><span class="nx">body</span><span class="p">,</span><span class="w"> </span><span class="nx">destination</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">body</span><span class="p">.</span><span class="nx">forEach</span><span class="p">((</span><span class="nx">expression</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span> <span class="w"> </span><span class="nx">expression</span><span class="p">,</span> <span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="nx">body</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="nx">destination</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">symbol</span><span class="p">(),</span> <span class="w"> </span><span class="nx">scope</span><span class="p">,</span> <span class="w"> </span><span class="p">),</span> <span class="w"> </span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">...</span> <span class="p">}</span> </pre></div> <p>Example:</p> <div class="highlight"><pre><span></span><span class="p">(</span><span class="nv">begin</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="mi">2</span><span class="p">)</span><span class="w"> </span><span class="c1">; returns 2</span> </pre></div> <h4 id="compileexpression(ast,-destination,-scope)">compileExpression(ast, destination, scope)</h4><p>This is the most generic compile function. If the ast is a list (representing a function call), it will pass compilation off to <code>compileCall</code>. Otherwise the only non-function call parts of the language are variable references and numeric literals.</p> <div class="highlight"><pre><span></span><span class="kd">class</span><span class="w"> </span><span class="nx">Compiler</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">...</span> <span class="w"> </span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">exp</span><span class="p">,</span><span class="w"> </span><span class="nx">destination</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Is a nested function call, compile it</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nb">Array</span><span class="p">.</span><span class="nx">isArray</span><span class="p">(</span><span class="nx">exp</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileCall</span><span class="p">(</span><span class="nx">exp</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span><span class="w"> </span><span class="nx">exp</span><span class="p">.</span><span class="nx">slice</span><span class="p">(</span><span class="mf">1</span><span class="p">),</span><span class="w"> </span><span class="nx">destination</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// If numeric literal, store to destination register by adding 0.</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nb">Number</span><span class="p">.</span><span class="nx">isInteger</span><span class="p">(</span><span class="nx">exp</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`%</span><span class="si">${</span><span class="nx">destination</span><span class="si">}</span><span class="sb"> = add i32 </span><span class="si">${</span><span class="nx">exp</span><span class="si">}</span><span class="sb">, 0`</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// If is local, store to destination register similarly.</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="nx">exp</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">res</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`%</span><span class="si">${</span><span class="nx">destination</span><span class="si">}</span><span class="sb"> = add i32 %</span><span class="si">${</span><span class="nx">res</span><span class="si">}</span><span class="sb">, 0`</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span> <span class="w"> </span><span class="s1">&#39;Attempt to reference undefined variable or unsupported literal: &#39;</span><span class="w"> </span><span class="o">+</span> <span class="w"> </span><span class="nx">exp</span><span class="p">,</span> <span class="w"> </span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">...</span> <span class="p">}</span> </pre></div> <p>Example:</p> <div class="highlight"><pre><span></span><span class="mi">1</span> <span class="o">...</span> <span class="nv">a</span> <span class="o">...</span> <span class="p">(</span><span class="nb">+</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="nv">a</span><span class="p">)</span> </pre></div> <h4 id="compilecall(functionname,-arguments,-destination,-scope)">compileCall(functionName, arguments, destination, scope)</h4><p>Most function calls will automatically compile arguments before calling the function. However, certain control-flow primitives don't do this (e.g. <code>def</code>, <code>if</code>, etc.). Macros in Lisp allow you to add new control-flow primitives (even if you don't use it to modify control-flow). But we will ignore user-defined primitives for now.</p> <p>We'll keep a list of control-flow primitives and pass off compilation to them if the function name matches a primitive. Otherwise, we'll look up the function name in scope (to find its safe name), compile the arguments, and call the function with the results of the arguments.</p> <div class="highlight"><pre><span></span><span class="kd">class</span><span class="w"> </span><span class="nx">Compiler</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">constructor</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">outBuffer</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[];</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">primitiveFunctions</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">def</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileDefine</span><span class="p">.</span><span class="nx">bind</span><span class="p">(</span><span class="k">this</span><span class="p">),</span> <span class="w"> </span><span class="nx">begin</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileBegin</span><span class="p">.</span><span class="nx">bind</span><span class="p">(</span><span class="k">this</span><span class="p">),</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">...</span> <span class="w"> </span><span class="nx">compileCall</span><span class="p">(</span><span class="nx">fun</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="p">,</span><span class="w"> </span><span class="nx">destination</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="k">this</span><span class="p">.</span><span class="nx">primitiveFunctions</span><span class="p">[</span><span class="nx">fun</span><span class="p">])</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">primitiveFunctions</span><span class="p">[</span><span class="nx">fun</span><span class="p">](</span><span class="nx">args</span><span class="p">,</span><span class="w"> </span><span class="nx">destination</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">validFunction</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">get</span><span class="p">(</span><span class="nx">fun</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">validFunction</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">safeArgs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">args</span> <span class="w"> </span><span class="p">.</span><span class="nx">map</span><span class="p">((</span><span class="nx">a</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">res</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">symbol</span><span class="p">();</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">a</span><span class="p">,</span><span class="w"> </span><span class="nx">res</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="s1">&#39;i32 %&#39;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">res</span><span class="p">;</span> <span class="w"> </span><span class="p">})</span> <span class="w"> </span><span class="p">.</span><span class="nx">join</span><span class="p">(</span><span class="s1">&#39;, &#39;</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`%</span><span class="si">${</span><span class="nx">destination</span><span class="si">}</span><span class="sb"> = call i32 @</span><span class="si">${</span><span class="nx">validFunction</span><span class="si">}</span><span class="sb">(</span><span class="si">${</span><span class="nx">safeArgs</span><span class="si">}</span><span class="sb">)`</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">&#39;Attempt to call undefined function: &#39;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">fun</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">...</span> <span class="p">}</span> </pre></div> <p>Yay LLVM for simplifying calls!</p> <p>Example:</p> <div class="highlight"><pre><span></span><span class="p">(</span><span class="nv">foo</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span> <span class="o">...</span> <span class="p">(</span><span class="nb">+</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="mi">2</span><span class="p">)</span> </pre></div> <h4 id="compiledefine([functionname,-parameters,-...body],-destination,-scope)">compileDefine([functionName, parameters, ...body], destination, scope)</h4><p>This is the last undefined compile function we've used. The call signature may look funny but we write less code if we keep the primitive signatures the same. In any case, JavaScript's destructuring makes it pretty enough.</p> <p>Aside from code generation, we also need to add the function itself to scope so we can look it up later in use. Additionally we need to create a copy of the current scope for the body of the function. And we'll add the parameter names themselves to the child scope.</p> <div class="highlight"><pre><span></span><span class="kd">class</span><span class="w"> </span><span class="nx">Compiler</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">...</span> <span class="w"> </span><span class="nx">compileDefine</span><span class="p">([</span><span class="nx">name</span><span class="p">,</span><span class="w"> </span><span class="nx">params</span><span class="p">,</span><span class="w"> </span><span class="p">...</span><span class="nx">body</span><span class="p">],</span><span class="w"> </span><span class="nx">destination</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Add this function to outer scope</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">safeName</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">register</span><span class="p">(</span><span class="nx">name</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Copy outer scope so parameter mappings aren&#39;t exposed in outer scope.</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">childScope</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">copy</span><span class="p">();</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">safeParams</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">params</span><span class="p">.</span><span class="nx">map</span><span class="p">((</span><span class="nx">param</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span> <span class="w"> </span><span class="c1">// Store parameter mapped to associated local</span> <span class="w"> </span><span class="nx">childScope</span><span class="p">.</span><span class="nx">register</span><span class="p">(</span><span class="nx">param</span><span class="p">),</span> <span class="w"> </span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span> <span class="w"> </span><span class="mf">0</span><span class="p">,</span> <span class="w"> </span><span class="sb">`define i32 @</span><span class="si">${</span><span class="nx">safeName</span><span class="si">}</span><span class="sb">(</span><span class="si">${</span><span class="nx">safeParams</span> <span class="w"> </span><span class="p">.</span><span class="nx">map</span><span class="p">((</span><span class="nx">p</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="sb">`i32 %</span><span class="si">${</span><span class="nx">p</span><span class="si">}</span><span class="sb">`</span><span class="p">)</span> <span class="w"> </span><span class="p">.</span><span class="nx">join</span><span class="p">(</span><span class="s1">&#39;, &#39;</span><span class="p">)</span><span class="si">}</span><span class="sb">) {`</span><span class="p">,</span> <span class="w"> </span><span class="p">);</span> <span class="w"> </span><span class="c1">// Pass childScope in for reference when body is compiled.</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">ret</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">childScope</span><span class="p">.</span><span class="nx">symbol</span><span class="p">();</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">body</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span><span class="w"> </span><span class="nx">ret</span><span class="p">,</span><span class="w"> </span><span class="nx">childScope</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`ret i32 %</span><span class="si">${</span><span class="nx">ret</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;}\n&#39;</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">...</span> <span class="p">}</span> </pre></div> <p>Example:</p> <div class="highlight"><pre><span></span><span class="p">(</span><span class="nv">def</span><span class="w"> </span><span class="nv">plus</span><span class="w"> </span><span class="p">(</span><span class="nv">a</span><span class="w"> </span><span class="nv">b</span><span class="p">)</span><span class="w"> </span><span class="p">(</span><span class="nb">+</span><span class="w"> </span><span class="nv">a</span><span class="w"> </span><span class="nv">b</span><span class="p">))</span> </pre></div> <h4 id="compileop(op)">compileOp(op)</h4><p>The last function mentioned above will help us expose some useful primitives. This function will take a string builtin operation and return a function that can be used to generate code when the operation is called.</p> <div class="highlight"><pre><span></span><span class="kd">class</span><span class="w"> </span><span class="nx">Compiler</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="p">...</span> <span class="w"> </span><span class="nx">compileOp</span><span class="p">(</span><span class="nx">op</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">([</span><span class="nx">a</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="p">],</span><span class="w"> </span><span class="nx">destination</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">arg1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">symbol</span><span class="p">();</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">arg2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">scope</span><span class="p">.</span><span class="nx">symbol</span><span class="p">();</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">a</span><span class="p">,</span><span class="w"> </span><span class="nx">arg1</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileExpression</span><span class="p">(</span><span class="nx">b</span><span class="p">,</span><span class="w"> </span><span class="nx">arg2</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">);</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`%</span><span class="si">${</span><span class="nx">destination</span><span class="si">}</span><span class="sb"> = </span><span class="si">${</span><span class="nx">op</span><span class="si">}</span><span class="sb"> i32 %</span><span class="si">${</span><span class="nx">arg1</span><span class="si">}</span><span class="sb">, %</span><span class="si">${</span><span class="nx">arg2</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">...</span> <span class="p">}</span> </pre></div> <p>This allows us to add some builtin ops as primitives (even though they aren't control-flow modifying).</p> <div class="highlight"><pre><span></span><span class="kd">class</span><span class="w"> </span><span class="nx">Compiler</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kr">constructor</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">outBuffer</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[];</span> <span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">primitiveFunctions</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">def</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileDefine</span><span class="p">.</span><span class="nx">bind</span><span class="p">(</span><span class="k">this</span><span class="p">),</span> <span class="w"> </span><span class="nx">begin</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileBegin</span><span class="p">.</span><span class="nx">bind</span><span class="p">(</span><span class="k">this</span><span class="p">),</span> <span class="w"> </span><span class="s1">&#39;+&#39;</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">&#39;add&#39;</span><span class="p">),</span> <span class="w"> </span><span class="s1">&#39;-&#39;</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">&#39;sub&#39;</span><span class="p">),</span> <span class="w"> </span><span class="s1">&#39;*&#39;</span><span class="o">:</span><span class="w"> </span><span class="k">this</span><span class="p">.</span><span class="nx">compileOp</span><span class="p">(</span><span class="s1">&#39;mul&#39;</span><span class="p">),</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">...</span> <span class="p">}</span> </pre></div> <p>Example:</p> <div class="highlight"><pre><span></span><span class="p">(</span><span class="nb">+</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="mi">2</span><span class="p">)</span> </pre></div> <h3 id="hello-world!">Hello world!</h3><p>Putting it all together, we'll compile this Lisp program:</p> <div class="highlight"><pre><span></span><span class="p">(</span><span class="nv">def</span><span class="w"> </span><span class="nv">plus-two</span><span class="w"> </span><span class="p">(</span><span class="nv">a</span><span class="w"> </span><span class="nv">b</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="nb">+</span><span class="w"> </span><span class="nv">a</span><span class="w"> </span><span class="p">(</span><span class="nb">+</span><span class="w"> </span><span class="nv">b</span><span class="w"> </span><span class="mi">2</span><span class="p">)))</span> <span class="p">(</span><span class="nv">def</span><span class="w"> </span><span class="nv">main</span><span class="w"> </span><span class="p">()</span> <span class="w"> </span><span class="p">(</span><span class="nv">plus-two</span><span class="w"> </span><span class="mi">3</span><span class="w"> </span><span class="p">(</span><span class="nv">plus-two</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="mi">1</span><span class="p">)))</span> </pre></div> <p>To get 9.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>node<span class="w"> </span>src/ulisp.js<span class="w"> </span>tests/function_definition.lisp $<span class="w"> </span>./build/prog $<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="nv">$?</span> <span class="m">9</span> </pre></div> <h4 id="generated-code">Generated code</h4><p>The generated LLVM can be found in <code>./build/prog.ll</code>:</p> <div class="highlight"><pre><span></span><span class="k">define</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="vg">@plus_two</span><span class="p">(</span><span class="kt">i32</span><span class="w"> </span><span class="nv">%a</span><span class="p">,</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="nv">%b</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nv">%sym7</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="nv">%a</span><span class="p">,</span><span class="w"> </span><span class="m">0</span> <span class="w"> </span><span class="nv">%sym9</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="nv">%b</span><span class="p">,</span><span class="w"> </span><span class="m">0</span> <span class="w"> </span><span class="nv">%sym10</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="m">2</span><span class="p">,</span><span class="w"> </span><span class="m">0</span> <span class="w"> </span><span class="nv">%sym8</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="nv">%sym9</span><span class="p">,</span><span class="w"> </span><span class="nv">%sym10</span> <span class="w"> </span><span class="nv">%sym6</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="nv">%sym7</span><span class="p">,</span><span class="w"> </span><span class="nv">%sym8</span> <span class="w"> </span><span class="k">ret</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="nv">%sym6</span> <span class="p">}</span> <span class="k">define</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="vg">@main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nv">%sym6</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="m">3</span><span class="p">,</span><span class="w"> </span><span class="m">0</span> <span class="w"> </span><span class="nv">%sym8</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">0</span> <span class="w"> </span><span class="nv">%sym9</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">add</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">0</span> <span class="w"> </span><span class="nv">%sym7</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">call</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="vg">@plus_two</span><span class="p">(</span><span class="kt">i32</span><span class="w"> </span><span class="nv">%sym8</span><span class="p">,</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="nv">%sym9</span><span class="p">)</span> <span class="w"> </span><span class="nv">%sym5</span><span class="w"> </span><span class="p">=</span><span class="w"> </span><span class="k">call</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="vg">@plus_two</span><span class="p">(</span><span class="kt">i32</span><span class="w"> </span><span class="nv">%sym6</span><span class="p">,</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="nv">%sym7</span><span class="p">)</span> <span class="w"> </span><span class="k">ret</span><span class="w"> </span><span class="kt">i32</span><span class="w"> </span><span class="nv">%sym5</span> <span class="p">}</span> </pre></div> <p>You can see all these unnecessary <code>add, ... 0</code> instructions. But let's look at the x86 assembly that LLVM generates in <code>build/prog.s</code>:</p> <div class="highlight"><pre><span></span><span class="nf">...</span> <span class="nl">_plus_two:</span><span class="w"> </span><span class="c1">## @plus_two</span> <span class="w"> </span><span class="nf">.cfi_startproc</span> <span class="c1">## %bb.0:</span> <span class="w"> </span><span class="c1">## kill: def $esi killed $esi def $rsi</span> <span class="w"> </span><span class="c1">## kill: def $edi killed $edi def $rdi</span> <span class="w"> </span><span class="nf">leal</span><span class="w"> </span><span class="mi">2</span><span class="p">(</span><span class="o">%</span><span class="nb">rdi</span><span class="p">,</span><span class="o">%</span><span class="nb">rsi</span><span class="p">),</span><span class="w"> </span><span class="o">%</span><span class="nb">eax</span> <span class="w"> </span><span class="nf">retq</span> <span class="w"> </span><span class="nf">.cfi_endproc</span> <span class="w"> </span><span class="c1">## -- End function</span> <span class="nf">...</span> </pre></div> <p>And we see that LLVM easily optimized the inefficiencies away. :)</p> <h3 id="next-up">Next up</h3><ul> <li>Compiling conditionals</li> <li>Tail call optimization</li> <li>Lists and dynamic memory</li> <li>Strings?</li> <li>Foreign function calls?</li> <li>Self-hosting?</li> </ul> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Adding an LLVM backend to ulisp (small Lisp compiler in JavaScript) <a href="https://t.co/VIddKW1r3N">https://t.co/VIddKW1r3N</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1104795606365757442?ref_src=twsrc%5Etfw">March 10, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/compiler-basics-llvm.htmlSun, 10 Mar 2019 00:00:00 +0000AOT-compilation of Javascript with V8http://notes.eatonphil.com/aot-compilation-of-javascript-with-v8.html<p>tldr; I'm working on a AOT-compiled Javascript implementation called <a href="https://github.com/eatonphil/jsc">jsc</a>.</p> <p>Many dynamically typed programming languages have implementations that compile to native binaries:</p> <ul> <li>Python: <a href="https://cython.org/">Cython</a></li> <li>Common Lisp: <a href="http://www.sbcl.org/">SBCL</a></li> <li>Scheme: <a href="https://www.call-cc.org/">Chicken Scheme</a></li> </ul> <p>The benefits of compiling dynamically typed languages are similar to those of compiling statically typed languages:</p> <ul> <li>Simplified deployment via a single binary</li> <li>Simplified foreign-function interfaces<ul> <li>e.g. <a href="https://wiki.call-cc.org/An%20extended%20FFI%20example">Embedded C/C++ strings</a></li> </ul> </li> <li>Predictable performance compared to JIT compiling interpreters</li> <li>Performance gains compared to non-JIT compiling interpreters</li> </ul> <p>I (re)discovered a common technique for compiling dynamic languages while developing <a href="https://github.com/eatonphil/bsdscheme">BSDScheme</a>, an interpreter and compiler for Scheme. In this technique, you use core parts of the runtime code as a library that is imported and referenced by compiled code.</p> <p>You save time building object-memory representations, memory management, operations, interacting with existing libraries, etc. when an interpreter already exists. The runtime as a library (plus existing parser frontends) allows you to focus solely on code generation of control flow.</p> <h3 id="the-first-pass">The first pass</h3><p>I wrote the initial version of <a href="https://github.com/eatonphil/jsc">jsc</a> in Rust using Dave Herman's <a href="https://github.com/dherman/esprit">esprit</a> parser (supports a subset of ES6 that includes all of ES5).</p> <p>The interesting parts of the runtime are taken care of by V8, e.g.:</p> <ul> <li><code>V8::String</code> - a Javascript string object<ul> <li><code>V8::String::NewFromUtf8(isolate, "hello world!")</code> - C++ string to Javascript string object</li> </ul> </li> <li><code>V8::Number</code> - a Javascript number object<ul> <li><code>V8::Number::New(isolate, 10)</code> - C++ double to Javascript number object</li> </ul> </li> <li>Heap allocations</li> <li>Calling convention</li> </ul> <p>And so on.</p> <h4 id="an-example">An example</h4><p>This first version of jsc could take the following Javascript:</p> <div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">fib</span><span class="p">(</span><span class="nx">n</span><span class="p">,</span><span class="w"> </span><span class="nx">a</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">n</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mf">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">a</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">n</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mf">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">b</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fib</span><span class="p">(</span><span class="nx">n</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="p">,</span><span class="w"> </span><span class="nx">a</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">b</span><span class="p">);</span> <span class="p">}</span> <span class="kd">function</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">fib</span><span class="p">(</span><span class="mf">50</span><span class="p">,</span><span class="w"> </span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="mf">1</span><span class="p">));</span> <span class="p">}</span> </pre></div> <p>And produce the following C++:</p> <div class="highlight"><pre><span></span><span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;string&gt;</span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;iostream&gt;</span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;node.h&gt;</span> <span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">Array</span><span class="p">;</span> <span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">Boolean</span><span class="p">;</span> <span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">Context</span><span class="p">;</span> <span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">Exception</span><span class="p">;</span> <span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">Function</span><span class="p">;</span> <span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">FunctionTemplate</span><span class="p">;</span> <span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">FunctionCallbackInfo</span><span class="p">;</span> <span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">Isolate</span><span class="p">;</span> <span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">Local</span><span class="p">;</span> <span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">Null</span><span class="p">;</span> <span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">Number</span><span class="p">;</span> <span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">Object</span><span class="p">;</span> <span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">String</span><span class="p">;</span> <span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">False</span><span class="p">;</span> <span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">True</span><span class="p">;</span> <span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">Value</span><span class="p">;</span> <span class="kt">void</span><span class="w"> </span><span class="nf">fib_0</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="n">FunctionCallbackInfo</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;&amp;</span><span class="w"> </span><span class="n">args</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">Isolate</span><span class="o">*</span><span class="w"> </span><span class="n">isolate</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">args</span><span class="p">.</span><span class="n">GetIsolate</span><span class="p">();</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;</span><span class="w"> </span><span class="n">n_1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;</span><span class="w"> </span><span class="n">a_2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">args</span><span class="p">[</span><span class="mi">1</span><span class="p">];</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;</span><span class="w"> </span><span class="n">b_3</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">args</span><span class="p">[</span><span class="mi">2</span><span class="p">];</span> <span class="nl">tail_recurse_4</span><span class="p">:</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Context</span><span class="o">&gt;</span><span class="w"> </span><span class="n">ctx_5</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">isolate</span><span class="o">-&gt;</span><span class="n">GetCurrentContext</span><span class="p">();</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Object</span><span class="o">&gt;</span><span class="w"> </span><span class="n">global_6</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ctx_5</span><span class="o">-&gt;</span><span class="n">Global</span><span class="p">();</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Function</span><span class="o">&gt;</span><span class="w"> </span><span class="n">Boolean_7</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Function</span><span class="o">&gt;::</span><span class="n">Cast</span><span class="p">(</span><span class="n">global_6</span><span class="o">-&gt;</span><span class="n">Get</span><span class="p">(</span><span class="n">String</span><span class="o">::</span><span class="n">NewFromUtf8</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Boolean&quot;</span><span class="p">)));</span> <span class="w"> </span><span class="n">String</span><span class="o">::</span><span class="n">Utf8Value</span><span class="w"> </span><span class="n">utf8value_tmp_8</span><span class="p">(</span><span class="n">n_1</span><span class="p">);</span> <span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">string_tmp_9</span><span class="p">(</span><span class="o">*</span><span class="n">utf8value_tmp_8</span><span class="p">);</span> <span class="w"> </span><span class="n">String</span><span class="o">::</span><span class="n">Utf8Value</span><span class="w"> </span><span class="n">utf8value_tmp_10</span><span class="p">(</span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">));</span> <span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">string_tmp_11</span><span class="p">(</span><span class="o">*</span><span class="n">utf8value_tmp_10</span><span class="p">);</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;</span><span class="w"> </span><span class="n">argv_12</span><span class="p">[]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="p">(</span><span class="n">n_1</span><span class="o">-&gt;</span><span class="n">IsBoolean</span><span class="p">()</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">IsBoolean</span><span class="p">())</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="n">Boolean</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="n">n_1</span><span class="o">-&gt;</span><span class="n">ToBoolean</span><span class="p">(</span><span class="n">isolate</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">Value</span><span class="p">()</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">ToBoolean</span><span class="p">(</span><span class="n">isolate</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">Value</span><span class="p">())</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">((</span><span class="n">n_1</span><span class="o">-&gt;</span><span class="n">IsNumber</span><span class="p">()</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">IsNumber</span><span class="p">())</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="n">Boolean</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="n">n_1</span><span class="o">-&gt;</span><span class="n">ToNumber</span><span class="p">(</span><span class="n">isolate</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">Value</span><span class="p">()</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">ToNumber</span><span class="p">(</span><span class="n">isolate</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">Value</span><span class="p">())</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">((</span><span class="n">n_1</span><span class="o">-&gt;</span><span class="n">IsString</span><span class="p">()</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">IsString</span><span class="p">())</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="n">Boolean</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="n">string_tmp_9</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">string_tmp_11</span><span class="p">)</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">(</span><span class="n">False</span><span class="p">(</span><span class="n">isolate</span><span class="p">))))</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;</span><span class="w"> </span><span class="n">result_13</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Boolean_7</span><span class="o">-&gt;</span><span class="n">Call</span><span class="p">(</span><span class="n">Null</span><span class="p">(</span><span class="n">isolate</span><span class="p">),</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="n">argv_12</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">result_13</span><span class="o">-&gt;</span><span class="n">ToBoolean</span><span class="p">()</span><span class="o">-&gt;</span><span class="n">Value</span><span class="p">())</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// return a;</span> <span class="w"> </span><span class="n">args</span><span class="p">.</span><span class="n">GetReturnValue</span><span class="p">().</span><span class="n">Set</span><span class="p">(</span><span class="n">a_2</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Context</span><span class="o">&gt;</span><span class="w"> </span><span class="n">ctx_14</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">isolate</span><span class="o">-&gt;</span><span class="n">GetCurrentContext</span><span class="p">();</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Object</span><span class="o">&gt;</span><span class="w"> </span><span class="n">global_15</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ctx_14</span><span class="o">-&gt;</span><span class="n">Global</span><span class="p">();</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Function</span><span class="o">&gt;</span><span class="w"> </span><span class="n">Boolean_16</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Function</span><span class="o">&gt;::</span><span class="n">Cast</span><span class="p">(</span><span class="n">global_15</span><span class="o">-&gt;</span><span class="n">Get</span><span class="p">(</span><span class="n">String</span><span class="o">::</span><span class="n">NewFromUtf8</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Boolean&quot;</span><span class="p">)));</span> <span class="w"> </span><span class="n">String</span><span class="o">::</span><span class="n">Utf8Value</span><span class="w"> </span><span class="n">utf8value_tmp_17</span><span class="p">(</span><span class="n">n_1</span><span class="p">);</span> <span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">string_tmp_18</span><span class="p">(</span><span class="o">*</span><span class="n">utf8value_tmp_17</span><span class="p">);</span> <span class="w"> </span><span class="n">String</span><span class="o">::</span><span class="n">Utf8Value</span><span class="w"> </span><span class="n">utf8value_tmp_19</span><span class="p">(</span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">));</span> <span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">string_tmp_20</span><span class="p">(</span><span class="o">*</span><span class="n">utf8value_tmp_19</span><span class="p">);</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;</span><span class="w"> </span><span class="n">argv_21</span><span class="p">[]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="p">(</span><span class="n">n_1</span><span class="o">-&gt;</span><span class="n">IsBoolean</span><span class="p">()</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">IsBoolean</span><span class="p">())</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="n">Boolean</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="n">n_1</span><span class="o">-&gt;</span><span class="n">ToBoolean</span><span class="p">(</span><span class="n">isolate</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">Value</span><span class="p">()</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">ToBoolean</span><span class="p">(</span><span class="n">isolate</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">Value</span><span class="p">())</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">((</span><span class="n">n_1</span><span class="o">-&gt;</span><span class="n">IsNumber</span><span class="p">()</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">IsNumber</span><span class="p">())</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="n">Boolean</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="n">n_1</span><span class="o">-&gt;</span><span class="n">ToNumber</span><span class="p">(</span><span class="n">isolate</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">Value</span><span class="p">()</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">ToNumber</span><span class="p">(</span><span class="n">isolate</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">Value</span><span class="p">())</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">((</span><span class="n">n_1</span><span class="o">-&gt;</span><span class="n">IsString</span><span class="p">()</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">IsString</span><span class="p">())</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="n">Boolean</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="n">string_tmp_18</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">string_tmp_20</span><span class="p">)</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">(</span><span class="n">False</span><span class="p">(</span><span class="n">isolate</span><span class="p">))))</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;</span><span class="w"> </span><span class="n">result_22</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Boolean_16</span><span class="o">-&gt;</span><span class="n">Call</span><span class="p">(</span><span class="n">Null</span><span class="p">(</span><span class="n">isolate</span><span class="p">),</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="n">argv_21</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">result_22</span><span class="o">-&gt;</span><span class="n">ToBoolean</span><span class="p">()</span><span class="o">-&gt;</span><span class="n">Value</span><span class="p">())</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// return b;</span> <span class="w"> </span><span class="n">args</span><span class="p">.</span><span class="n">GetReturnValue</span><span class="p">().</span><span class="n">Set</span><span class="p">(</span><span class="n">b_3</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// return fib(n - 1, b, a + b);</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;</span><span class="w"> </span><span class="n">arg_23</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">n_1</span><span class="o">-&gt;</span><span class="n">IsNumber</span><span class="p">()</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">IsNumber</span><span class="p">())</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="p">(</span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="n">n_1</span><span class="o">-&gt;</span><span class="n">ToNumber</span><span class="p">(</span><span class="n">isolate</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">Value</span><span class="p">()</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">ToNumber</span><span class="p">(</span><span class="n">isolate</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">Value</span><span class="p">()))</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Number</span><span class="o">&gt;::</span><span class="n">Cast</span><span class="p">(</span><span class="n">Null</span><span class="p">(</span><span class="n">isolate</span><span class="p">));</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;</span><span class="w"> </span><span class="n">arg_24</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">b_3</span><span class="p">;</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;</span><span class="w"> </span><span class="n">arg_25</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">a_2</span><span class="o">-&gt;</span><span class="n">IsString</span><span class="p">()</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">b_3</span><span class="o">-&gt;</span><span class="n">IsString</span><span class="p">())</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;::</span><span class="n">Cast</span><span class="p">(</span><span class="n">String</span><span class="o">::</span><span class="n">Concat</span><span class="p">(</span><span class="n">a_2</span><span class="o">-&gt;</span><span class="n">ToString</span><span class="p">(),</span><span class="w"> </span><span class="n">b_3</span><span class="o">-&gt;</span><span class="n">ToString</span><span class="p">()))</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;::</span><span class="n">Cast</span><span class="p">((</span><span class="n">a_2</span><span class="o">-&gt;</span><span class="n">IsNumber</span><span class="p">()</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">b_3</span><span class="o">-&gt;</span><span class="n">IsNumber</span><span class="p">())</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="p">(</span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="n">a_2</span><span class="o">-&gt;</span><span class="n">ToNumber</span><span class="p">(</span><span class="n">isolate</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">Value</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">b_3</span><span class="o">-&gt;</span><span class="n">ToNumber</span><span class="p">(</span><span class="n">isolate</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">Value</span><span class="p">()))</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Number</span><span class="o">&gt;::</span><span class="n">Cast</span><span class="p">(</span><span class="n">Null</span><span class="p">(</span><span class="n">isolate</span><span class="p">)));</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">FunctionTemplate</span><span class="o">&gt;</span><span class="w"> </span><span class="n">ftpl_27</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">FunctionTemplate</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="n">fib_0</span><span class="p">);</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Function</span><span class="o">&gt;</span><span class="w"> </span><span class="n">fn_26</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ftpl_27</span><span class="o">-&gt;</span><span class="n">GetFunction</span><span class="p">();</span> <span class="w"> </span><span class="n">fn_26</span><span class="o">-&gt;</span><span class="n">SetName</span><span class="p">(</span><span class="n">String</span><span class="o">::</span><span class="n">NewFromUtf8</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;fib_0&quot;</span><span class="p">));</span> <span class="w"> </span><span class="n">n_1</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">arg_23</span><span class="p">;</span> <span class="w"> </span><span class="n">a_2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">arg_24</span><span class="p">;</span> <span class="w"> </span><span class="n">b_3</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">arg_25</span><span class="p">;</span> <span class="w"> </span><span class="k">goto</span><span class="w"> </span><span class="n">tail_recurse_4</span><span class="p">;</span> <span class="p">}</span> <span class="kt">void</span><span class="w"> </span><span class="nf">jsc_main</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="n">FunctionCallbackInfo</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;&amp;</span><span class="w"> </span><span class="n">args</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">Isolate</span><span class="o">*</span><span class="w"> </span><span class="n">isolate</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">args</span><span class="p">.</span><span class="n">GetIsolate</span><span class="p">();</span> <span class="nl">tail_recurse_5</span><span class="p">:</span> <span class="w"> </span><span class="c1">// console.log(fib(50, 0, 1))</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;</span><span class="w"> </span><span class="n">dot_parent_7</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">isolate</span><span class="o">-&gt;</span><span class="n">GetCurrentContext</span><span class="p">()</span><span class="o">-&gt;</span><span class="n">Global</span><span class="p">()</span><span class="o">-&gt;</span><span class="n">Get</span><span class="p">(</span><span class="n">String</span><span class="o">::</span><span class="n">NewFromUtf8</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;console&quot;</span><span class="p">));</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">String</span><span class="o">&gt;</span><span class="w"> </span><span class="n">property_8</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">String</span><span class="o">::</span><span class="n">NewFromUtf8</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;log&quot;</span><span class="p">);</span> <span class="w"> </span><span class="k">while</span><span class="w"> </span><span class="p">(</span><span class="n">dot_parent_7</span><span class="o">-&gt;</span><span class="n">IsObject</span><span class="p">()</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="o">!</span><span class="n">dot_parent_7</span><span class="p">.</span><span class="n">As</span><span class="o">&lt;</span><span class="n">Object</span><span class="o">&gt;</span><span class="p">()</span><span class="o">-&gt;</span><span class="n">HasOwnProperty</span><span class="p">(</span><span class="n">isolate</span><span class="o">-&gt;</span><span class="n">GetCurrentContext</span><span class="p">(),</span><span class="w"> </span><span class="n">property_8</span><span class="p">).</span><span class="n">ToChecked</span><span class="p">())</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">dot_parent_7</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">dot_parent_7</span><span class="p">.</span><span class="n">As</span><span class="o">&lt;</span><span class="n">Object</span><span class="o">&gt;</span><span class="p">()</span><span class="o">-&gt;</span><span class="n">GetPrototype</span><span class="p">();</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;</span><span class="w"> </span><span class="n">dot_result_6</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">dot_parent_7</span><span class="p">.</span><span class="n">As</span><span class="o">&lt;</span><span class="n">Object</span><span class="o">&gt;</span><span class="p">()</span><span class="o">-&gt;</span><span class="n">Get</span><span class="p">(</span><span class="n">isolate</span><span class="o">-&gt;</span><span class="n">GetCurrentContext</span><span class="p">(),</span><span class="w"> </span><span class="n">property_8</span><span class="p">).</span><span class="n">ToLocalChecked</span><span class="p">();</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;</span><span class="w"> </span><span class="n">arg_9</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">50</span><span class="p">);</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;</span><span class="w"> </span><span class="n">arg_10</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;</span><span class="w"> </span><span class="n">arg_11</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">FunctionTemplate</span><span class="o">&gt;</span><span class="w"> </span><span class="n">ftpl_13</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">FunctionTemplate</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="n">fib_0</span><span class="p">);</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Function</span><span class="o">&gt;</span><span class="w"> </span><span class="n">fn_12</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ftpl_13</span><span class="o">-&gt;</span><span class="n">GetFunction</span><span class="p">();</span> <span class="w"> </span><span class="n">fn_12</span><span class="o">-&gt;</span><span class="n">SetName</span><span class="p">(</span><span class="n">String</span><span class="o">::</span><span class="n">NewFromUtf8</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;fib_0&quot;</span><span class="p">));</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;</span><span class="w"> </span><span class="n">argv_14</span><span class="p">[]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">arg_9</span><span class="p">,</span><span class="w"> </span><span class="n">arg_10</span><span class="p">,</span><span class="w"> </span><span class="n">arg_11</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;</span><span class="w"> </span><span class="n">result_15</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fn_12</span><span class="o">-&gt;</span><span class="n">Call</span><span class="p">(</span><span class="n">Null</span><span class="p">(</span><span class="n">isolate</span><span class="p">),</span><span class="w"> </span><span class="mi">3</span><span class="p">,</span><span class="w"> </span><span class="n">argv_14</span><span class="p">);</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;</span><span class="w"> </span><span class="n">arg_16</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">result_15</span><span class="p">;</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Function</span><span class="o">&gt;</span><span class="w"> </span><span class="n">fn_17</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Function</span><span class="o">&gt;::</span><span class="n">Cast</span><span class="p">(</span><span class="n">dot_result_6</span><span class="p">);</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;</span><span class="w"> </span><span class="n">argv_18</span><span class="p">[]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">arg_16</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;</span><span class="w"> </span><span class="n">result_19</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fn_17</span><span class="o">-&gt;</span><span class="n">Call</span><span class="p">(</span><span class="n">dot_parent_7</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="n">argv_18</span><span class="p">);</span> <span class="w"> </span><span class="n">result_19</span><span class="p">;</span> <span class="p">}</span> <span class="kt">void</span><span class="w"> </span><span class="nf">Init</span><span class="p">(</span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Object</span><span class="o">&gt;</span><span class="w"> </span><span class="n">exports</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">NODE_SET_METHOD</span><span class="p">(</span><span class="n">exports</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;jsc_main&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">jsc_main</span><span class="p">);</span> <span class="p">}</span> <span class="n">NODE_MODULE</span><span class="p">(</span><span class="n">NODE_GYP_MODULE_NAME</span><span class="p">,</span><span class="w"> </span><span class="n">Init</span><span class="p">)</span> </pre></div> <p>This output gets compiled (by jsc) as a <a href="https://nodejs.org/api/addons.html">Node addon</a> using <a href="https://github.com/nodejs/node-gyp">node-gyp</a>.</p> <p>The compiled addon is loaded by a single-line Javascript file generated by jsc:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>rm<span class="w"> </span>-rf<span class="w"> </span>build $<span class="w"> </span>jsc<span class="w"> </span>fib.js $<span class="w"> </span>cat<span class="w"> </span>build/fib.js require<span class="o">(</span><span class="s2">&quot;build/Release/fib.node&quot;</span><span class="o">)</span>.jsc_main<span class="o">()</span> $<span class="w"> </span>node<span class="w"> </span>build/fib.js <span class="m">12586269025</span> </pre></div> <h4 id="analysis">Analysis</h4><p>The code was a mess of bad formatting, unnecessary locals, inefficient basic operations (e.g. huge, often unnecessary Boolean conversions), and so on. The unnecessary locals was partially a by-product of single-pass code generation. And the unnecessary conversions was partly due to ignoring types (even types of literals that you don't need Typescript/Flow to provide).</p> <p>After I got this proof-of-concept working for basic examples, I wanted to rewrite it around <a href="https://github.com/eatonphil/one-pass-code-generation-in-v8/blob/master/One-pass%20Code%20Generation%20in%20V8.pdf">destination-driven code generation</a>, a technique by Kent Dybvig used in V8's baseline compiler. And after a few weeks not getting far in a refactor in Rust, I rewrote the compiler in Typescript.</p> <h3 id="the-second-pass">The second pass</h3><p>Written in Typescript and using the <a href="https://github.com/Microsoft/TypeScript/wiki/Using-the-Compiler-API">Typescript compiler API</a>, this second iteration was built to do destination-driven code generation and leaf type propagation. Destination-driven code generation allows a single-pass code generator to reduce redundant reassignments. And leaf type propagation allows simple, obvious optimizations such as just calling <code>V8::Boolean::IsTrue()</code> on a statically-known boolean rather than calling <code>V8::Value::Equals()</code>.</p> <h4 id="example">Example</h4><p>Given the same fibonacci Javascript program from before, this iteration produces the following C++:</p> <div class="highlight"><pre><span></span><span class="cp">#include</span><span class="w"> </span><span class="cpf">&quot;lib.cc&quot;</span> <span class="kt">void</span><span class="w"> </span><span class="nf">tco_fib</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="n">FunctionCallbackInfo</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;&amp;</span><span class="w"> </span><span class="n">_args</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">Isolate</span><span class="o">*</span><span class="w"> </span><span class="n">isolate</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">_args</span><span class="p">.</span><span class="n">GetIsolate</span><span class="p">();</span> <span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o">&lt;</span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;&gt;</span><span class="w"> </span><span class="n">args</span><span class="p">(</span><span class="n">_args</span><span class="p">.</span><span class="n">Length</span><span class="p">());;</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">_args</span><span class="p">.</span><span class="n">Length</span><span class="p">();</span><span class="w"> </span><span class="n">i</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="n">args</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">_args</span><span class="p">[</span><span class="n">i</span><span class="p">];</span> <span class="nl">tail_recurse_0</span><span class="p">:</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Number</span><span class="o">&gt;</span><span class="w"> </span><span class="n">sym_rhs_4</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Boolean</span><span class="o">&gt;</span><span class="w"> </span><span class="n">sym_anon_2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">-&gt;</span><span class="n">StrictEquals</span><span class="p">(</span><span class="n">sym_rhs_4</span><span class="p">)</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="n">True</span><span class="p">(</span><span class="n">isolate</span><span class="p">)</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="n">False</span><span class="p">(</span><span class="n">isolate</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">sym_anon_2</span><span class="o">-&gt;</span><span class="n">IsTrue</span><span class="p">())</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">_args</span><span class="p">.</span><span class="n">GetReturnValue</span><span class="p">().</span><span class="n">Set</span><span class="p">(</span><span class="n">args</span><span class="p">[</span><span class="mi">1</span><span class="p">]);</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Number</span><span class="o">&gt;</span><span class="w"> </span><span class="n">sym_rhs_11</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Boolean</span><span class="o">&gt;</span><span class="w"> </span><span class="n">sym_anon_9</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">-&gt;</span><span class="n">StrictEquals</span><span class="p">(</span><span class="n">sym_rhs_11</span><span class="p">)</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="n">True</span><span class="p">(</span><span class="n">isolate</span><span class="p">)</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="n">False</span><span class="p">(</span><span class="n">isolate</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">sym_anon_9</span><span class="o">-&gt;</span><span class="n">IsTrue</span><span class="p">())</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">_args</span><span class="p">.</span><span class="n">GetReturnValue</span><span class="p">().</span><span class="n">Set</span><span class="p">(</span><span class="n">args</span><span class="p">[</span><span class="mi">2</span><span class="p">]);</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Number</span><span class="o">&gt;</span><span class="w"> </span><span class="n">sym_rhs_19</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;</span><span class="w"> </span><span class="n">sym_arg_17</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">genericMinus</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span><span class="w"> </span><span class="n">sym_rhs_19</span><span class="p">);</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;</span><span class="w"> </span><span class="n">sym_arg_21</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">genericPlus</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="n">args</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span><span class="w"> </span><span class="n">args</span><span class="p">[</span><span class="mi">2</span><span class="p">]);</span> <span class="w"> </span><span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sym_arg_17</span><span class="p">;</span> <span class="w"> </span><span class="n">args</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">args</span><span class="p">[</span><span class="mi">2</span><span class="p">];</span> <span class="w"> </span><span class="n">args</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sym_arg_21</span><span class="p">;</span> <span class="w"> </span><span class="k">goto</span><span class="w"> </span><span class="n">tail_recurse_0</span><span class="p">;</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="p">}</span> <span class="kt">void</span><span class="w"> </span><span class="nf">jsc_main</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="n">FunctionCallbackInfo</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;&amp;</span><span class="w"> </span><span class="n">_args</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">Isolate</span><span class="o">*</span><span class="w"> </span><span class="n">isolate</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">_args</span><span class="p">.</span><span class="n">GetIsolate</span><span class="p">();</span> <span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o">&lt;</span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;&gt;</span><span class="w"> </span><span class="n">args</span><span class="p">(</span><span class="n">_args</span><span class="p">.</span><span class="n">Length</span><span class="p">());;</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kt">int</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mi">0</span><span class="p">;</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="n">_args</span><span class="p">.</span><span class="n">Length</span><span class="p">();</span><span class="w"> </span><span class="n">i</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="n">args</span><span class="p">[</span><span class="n">i</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">_args</span><span class="p">[</span><span class="n">i</span><span class="p">];</span> <span class="nl">tail_recurse_1</span><span class="p">:</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Number</span><span class="o">&gt;</span><span class="w"> </span><span class="n">sym_arg_29</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">100</span><span class="p">);</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Number</span><span class="o">&gt;</span><span class="w"> </span><span class="n">sym_arg_30</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">0</span><span class="p">);</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Number</span><span class="o">&gt;</span><span class="w"> </span><span class="n">sym_arg_31</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">);</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;</span><span class="w"> </span><span class="n">sym_args_32</span><span class="p">[]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">sym_arg_29</span><span class="p">,</span><span class="w"> </span><span class="n">sym_arg_30</span><span class="p">,</span><span class="w"> </span><span class="n">sym_arg_31</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Function</span><span class="o">&gt;</span><span class="w"> </span><span class="n">sym_fn_33</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">FunctionTemplate</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="n">tco_fib</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">GetFunction</span><span class="p">();</span> <span class="w"> </span><span class="n">sym_fn_33</span><span class="o">-&gt;</span><span class="n">SetName</span><span class="p">(</span><span class="n">String</span><span class="o">::</span><span class="n">NewFromUtf8</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;tco_fib&quot;</span><span class="p">));</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;</span><span class="w"> </span><span class="n">sym_arg_28</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sym_fn_33</span><span class="o">-&gt;</span><span class="n">Call</span><span class="p">(</span><span class="n">sym_fn_33</span><span class="p">,</span><span class="w"> </span><span class="mi">3</span><span class="p">,</span><span class="w"> </span><span class="n">sym_args_32</span><span class="p">);</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;</span><span class="w"> </span><span class="n">sym_args_34</span><span class="p">[]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">sym_arg_28</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;</span><span class="w"> </span><span class="n">sym_parent_37</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">isolate</span><span class="o">-&gt;</span><span class="n">GetCurrentContext</span><span class="p">()</span><span class="o">-&gt;</span><span class="n">Global</span><span class="p">()</span><span class="o">-&gt;</span><span class="n">Get</span><span class="p">(</span><span class="n">String</span><span class="o">::</span><span class="n">NewFromUtf8</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;console&quot;</span><span class="p">));</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;</span><span class="w"> </span><span class="n">sym_anon_36</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sym_parent_37</span><span class="p">.</span><span class="n">As</span><span class="o">&lt;</span><span class="n">Object</span><span class="o">&gt;</span><span class="p">()</span><span class="o">-&gt;</span><span class="n">Get</span><span class="p">(</span><span class="n">String</span><span class="o">::</span><span class="n">NewFromUtf8</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;log&quot;</span><span class="p">));</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Function</span><span class="o">&gt;</span><span class="w"> </span><span class="n">sym_fn_35</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Function</span><span class="o">&gt;::</span><span class="n">Cast</span><span class="p">(</span><span class="n">sym_anon_36</span><span class="p">);</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;</span><span class="w"> </span><span class="n">sym_anon_27</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sym_fn_35</span><span class="o">-&gt;</span><span class="n">Call</span><span class="p">(</span><span class="n">sym_fn_35</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="n">sym_args_34</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="p">}</span> <span class="kt">void</span><span class="w"> </span><span class="nf">Init</span><span class="p">(</span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Object</span><span class="o">&gt;</span><span class="w"> </span><span class="n">exports</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">NODE_SET_METHOD</span><span class="p">(</span><span class="n">exports</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;jsc_main&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">jsc_main</span><span class="p">);</span> <span class="p">}</span> <span class="n">NODE_MODULE</span><span class="p">(</span><span class="n">NODE_GYP_MODULE_NAME</span><span class="p">,</span><span class="w"> </span><span class="n">Init</span><span class="p">)</span> </pre></div> <h4 id="analysis">Analysis</h4><p>Common code (<code>genericPlus</code>, <code>genericMinus</code>) and all imports have been pulled into <code>lib.cc</code> for clarity. And the entire result is run through <a href="https://clang.llvm.org/docs/ClangFormat.html">clang-format</a> if it is present on the system.</p> <p>The benefit of leaf type propagation can be seen everywhere a local is declared that is not <code>Local<Value></code> and specifically in if tests on statically known booleans:</p> <div class="highlight"><pre><span></span><span class="p">...</span> <span class="n">Local</span><span class="o">&lt;</span><span class="n">Boolean</span><span class="o">&gt;</span><span class="w"> </span><span class="n">sym_anon_2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">-&gt;</span><span class="n">StrictEquals</span><span class="p">(</span><span class="n">sym_rhs_4</span><span class="p">)</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="n">True</span><span class="p">(</span><span class="n">isolate</span><span class="p">)</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="n">False</span><span class="p">(</span><span class="n">isolate</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">sym_anon_2</span><span class="o">-&gt;</span><span class="n">IsTrue</span><span class="p">())</span><span class="w"> </span><span class="p">{</span> <span class="p">...</span> </pre></div> <p>It's obvious to a human that there is another optimization you could do here by not wrapping this check in a <code>V8::Boolean</code> at all. The only types tracked in destinations are V8 types, not yet C++ types. But not needing to passing this through a <code>bool toBoolean(Value v)</code> wrapper is still an improvement.</p> <p>In general, unboxing has not really been explore. But the ultimate goal is to use Typescript types to produce function- or block-level unboxed versions -- perhaps using a toggle in code to specify safety à la Common Lisp.</p> <h3 id="next-steps">Next steps</h3><p>I broke tests and regressed on syntax support in the Typescript port, so that's the first step. The second step is enough syntax to support more interesting benchmarks than the fibonacci example (which has comparative performance to Node.js/V8 but isn't saying much).</p> <p>After that:</p> <ul> <li>Unboxed expressions</li> <li>Unboxed blocks</li> <li>Foreign-function interface</li> <li>Self-hosting</li> <li>Node-API compatible runtime without Node</li> </ul> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Companion blog post to my talk on an AOT-compiled Javascript implementation built on Typescript <a href="https://t.co/0aHVJ9UzYh">https://t.co/0aHVJ9UzYh</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1100397733867859968?ref_src=twsrc%5Etfw">February 26, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/aot-compilation-of-javascript-with-v8.htmlTue, 26 Feb 2019 00:00:00 +0000Transparency and communication on small teamshttp://notes.eatonphil.com/transparency-and-communication-on-small-teams.html<p>I saw a post on <a href="https://dev.to/vcarl/symptoms-of-a-dysfunctional-team-1c0">dev.to</a> that talks about dysfunctional teams. This is a response that focuses specifically on how to prevent burnout from overworking. This is aimed at senior/lead engineers and engineering/project managers -- because everyone in a leadership role is responsible for the health of the team and the company.</p> <p>In an otherwise good company with hard-working, ethical employees, overworking happens because of imperfect communication. If neither of those premises hold, you have more serious issues and have no need for this post.</p> <p>The primary subjects of poor communication are:</p> <ul> <li>Capacity/capabilities</li> <li>Priorities</li> <li>Results</li> </ul> <p>If any member of the team (or worse, the entire team) is not honestly reporting on their capacity and capability, this will drive them to overwork to make up for what they couldn't accomplish on work hours.</p> <p>If any member of the team (or worse, the entire team) is not honestly and publicly reporting on what they understand to be the priorities, they will end up needing to work overtime if true priorities become apparent too late.</p> <p>And if any member of the team (or worse, the entire team) is not honestly and publicly reporting on what they <strong>accomplished</strong>, they will end up needing to work overtime if discrepancies become apparent too late.</p> <h3 id="solution">Solution</h3><p>Put a sprint process in place and schedule <strong>at least</strong> one meeting every sprint. Discover every political, technical, and structural stakeholder and find a time they can attend this meeting. At this meeting you will cover at a high level (perhaps with some demos) what was accomplished in the sprint and what you intend to accomplish in the next sprint.</p> <p>If any stakeholder cannot make this meeting, find a time to sync up with him/her separately.</p> <p>Your sprints should not last more than two weeks because any longer is too long to go before talking to/reviewing with your stakeholders.</p> <p>Finally, publish a report on what you accomplished this sprint (and also what you did not accomplish!) and what you plan to accomplish the next sprint. For example, I send an email to the engineering organization with two docs at the end of each sprint: 1) a review doc listing tasks accomplished/not accomplished and 2) a list of tasks planned for the next sprint. This gives your stakeholders (and anyone else interested) an opportunity to review the contents of the meeting at their leisure.</p> <p>Doing this can be difficult and embarrassing at first. Hard-working, ethical employees never want to be seen as not accomplishing their share of work. But the most important thing for the mid-to-long-term health of these employees is to get them reporting honestly.</p> <p>This helps make it clear where these employees can legitimately improve (i.e. receive more training) and where it's necessary to hire more or different employees. You'll likely need to put pressure on every team member to report honestly and to do so without fear.</p> <p>And as a result of doing this, you've done everything you can as a senior/lead member of a small team to push responsibility for your team's work up to your stakeholders. This is the best position to be in.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">tldr; don&#39;t let your folks overwork unnecessarily when you could be reporting more frequently/honestly on understood priorities and accomplishments achieved/not achieved <a href="https://t.co/PeTe2Bq0Xz">https://t.co/PeTe2Bq0Xz</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1087722536236957697?ref_src=twsrc%5Etfw">January 22, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/transparency-and-communication-on-small-teams.htmlTue, 22 Jan 2019 00:00:00 +0000Windowshttp://notes.eatonphil.com/windows.html<p>It has been six years since I last used Windows for any remotely serious software development. I've used Ubuntu, Arch, or FreeBSD since. But eventually I spent so much time working around common workplace tasks that I decided to put Windows 10 Pro on my work laptop.</p> <h3 id="windows-subsystem-for-linux">Windows Subsystem for Linux</h3><p>Introduced in 2016, this technology allows Windows to run unmodified Linux binaries. The core feat being <a href="https://blogs.msdn.microsoft.com/wsl/2016/06/08/wsl-system-calls/">syscall translation</a>.</p> <p>It works nearly flawlessly. This means I can do all my Go, Node, PostgreSQL development on Windows without a virtual machine using bash, tmux, git, emacs, etc.</p> <p>I've seen a few minor exceptions over the course of regular software development in WSL:</p> <ul> <li><a href="https://github.com/Microsoft/WSL/issues/2249">ss/netstat does not work</a></li> <li><a href="https://github.com/hashicorp/vagrant/issues/8700">vagrant does not work</a></li> </ul> <p>More generally, Linux programs are heavily file-oriented. And Windows I/O <a href="https://github.com/Microsoft/WSL/issues/873#issuecomment-425272829">is not designed well for that</a>. In the worst cases (installing/adding Node packages) it can take minutes to do operations that would take Linux seconds.</p> <h3 id="vagrant">Vagrant</h3><p>Vagrant-Windows interoperability is abysmal.</p> <p>As noted above, you cannot manage Hyper-V from vagrant within WSL. So you're stuck using Powershell. Even then, managing synced files from vagrant is a nightmare. The default sync method requires you to sign in using your <strong>Windows Live</strong> username and password on every reboot. But Node package installation attempts some file operations that are not supported over the default synced, network filesystem.</p> <p>When I switched to rsync vagrant wouldn't reliable sync when the virtual machine went down and came back up.</p> <p>After hours of trying to get some files synced with vagrant I gave up.</p> <h3 id="hyper-v">Hyper-V</h3><p>Hyper-V's GUI is much more complex/feature-complete than VirtualBox. It even provides a Ubuntu-quick-install that I used to jump right in. I don't recommend using this though because it gives you no option but an 11GB hard disk. I didn't realize this until I went through an hour or two of post-install customization only to run out of space. Too lazy to boot into a live CD to grow the root filesystem I reinstalled with a more suitable 64GB drive and went through the hour-long post-install customization process again.</p> <p>Networking in Hyper-V is more complex/feature-complete than VirtualBox as well. To access a Hyper-V machine you must create a new virtual network interface manually and associate it. Static IP address appear to be controlled at the host networking level (e.g. Control Panel) instead of within the Hyper-V interface. This highlights how these virtual interfaces are first-class, but overcomplicates the process of getting started.</p> <p>Ultimately I gave up on a static IP address and decided to reboot less frequently.</p> <p>Performance-wise Hyper-V machines are exactly as expected: excellent.</p> <h3 id="misc">Misc</h3><p>Docker support on Windows needs work. It took me a while to understand how Docker interacts with the WSL filesystem and what I needed to do to allow Docker to mount. The complexity is similar on macOS when you want to mount privileged directories like /var, but the experience is worse on Windows.</p> <p>Apparently Windows does have tiling window managers, but I have not tried one out yet.</p> <p>Powershell, a language with real types, is pretty compelling. But I have not spent enough time with it to be efficient. And since WSL is mostly good enough I don't really plan to.</p> <p>Windows doesn't allow you to delete any files that are "in use". This is kinda cool except for that the errors you get when trying to delete files that are in use are useless. They are even more useless when you get the plain "could not delete directory" when you try to delete a directory with some file inside it that is in use. I had to start deleting files within by hand until I found the one I realized was in use.</p> <h3 id="conclusion">Conclusion</h3><p>If you have never run Linux or FreeBSD, don't use this post as an excuse not to. You should run Linux or FreeBSD for the experience. But if you've reached diminishing returns in your Linux/FreeBSD use, Windows as a development environment has come a long way. It may be the best platform available for software development, the profession.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Some notes on my experience having replaced Arch Linux with Windows on my work laptop <a href="https://t.co/8asxZmspwR">https://t.co/8asxZmspwR</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1086994000182153222?ref_src=twsrc%5Etfw">January 20, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/windows.htmlSun, 20 Jan 2019 00:00:00 +0000Writing a lisp compiler from scratch in JavaScript: 2. user-defined functions and variableshttp://notes.eatonphil.com/compiler-basics-functions.html<p class="note"> Previously in compiler basics: <! forgive me, for I have sinned > <br /> <a href="/compiler-basics-lisp-to-assembly.html">1. lisp to assembly</a> <br/> <br/> Next in compiler basics: <br/> <a href="/compiler-basics-llvm.html">3. LLVM</a> <br /> <a href="/compiler-basics-llvm-conditionals.html">4. LLVM conditionals and compiling fibonacci</a> <br /> <a href="/compiler-basics-llvm-system-calls.html">5. LLVM system calls</a> <br /> <a href="/compiler-basics-an-x86-upgrade.html">6. an x86 upgrade</a> </p><p>In this post we'll extend the compiler to support defining functions and variables. Additionally, we'll require the program's entrypoint to be within a <code>main</code> function.</p> <p>The resulting code can be found <a href="https://github.com/eatonphil/ulisp">here</a>.</p> <h3 id="function-definition">Function definition</h3><p>The simplest function definition we need to support is for our <code>main</code> function. This will look like this:</p> <div class="highlight"><pre><span></span><span class="nv">$</span><span class="w"> </span><span class="nv">cat</span><span class="w"> </span><span class="nv">basic.lisp</span> <span class="p">(</span><span class="nv">def</span><span class="w"> </span><span class="nv">main</span><span class="w"> </span><span class="p">()</span> <span class="w"> </span><span class="p">(</span><span class="nb">+</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="mi">2</span><span class="p">))</span> </pre></div> <p>Where compiling and running it should produce a return code of 3:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>node<span class="w"> </span>ulisp.js<span class="w"> </span>basic.lisp $<span class="w"> </span>./build/a.out $<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="nv">$?</span> <span class="m">3</span> </pre></div> <h3 id="parsing-function-definitions">Parsing function definitions</h3><p>The entire language is defined in S-expressions and we already parse S-expressions.</p> <div class="highlight"><pre><span></span><span class="nx">$</span><span class="w"> </span><span class="nx">node</span> <span class="o">&gt;</span><span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">parse</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">&#39;./parser&#39;</span><span class="p">);</span> <span class="o">&gt;</span><span class="w"> </span><span class="nb">JSON</span><span class="p">.</span><span class="nx">stringify</span><span class="p">(</span><span class="nx">parse</span><span class="p">(</span><span class="s1">&#39;(def main () (+ 1 2))&#39;</span><span class="p">));</span> <span class="s1">&#39;[[[&quot;def&quot;,&quot;main&quot;,[],[&quot;+&quot;,1,2]]],&quot;&quot;]&#39;</span> </pre></div> <p>So we're done!</p> <h3 id="code-generation">Code generation</h3><p>There are two tricky parts to code generation once function definitions are introduced:</p> <ul> <li>Functions definitions are not expressions (in assembly)</li> <li>Function calling conventions for the <strong>callee</strong></li> <li>Variable scope</li> </ul> <h4 id="function-definitions">Function definitions</h4><p>A function definition looks like a function call. So we'll need to keep a list of "primitive" functions that handle what looks like function calls differently.</p> <div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">compile_define</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// TODO</span> <span class="p">}</span> <span class="kd">const</span><span class="w"> </span><span class="nx">primitive_functions</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">def</span><span class="o">:</span><span class="w"> </span><span class="nx">compile_define</span><span class="p">,</span> <span class="p">};</span> </pre></div> <p>Then in our <code>compile_call</code> function we need to see if the function being "called" is in this list. If so, we allow the associated callback to handle compilation.</p> <div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">compile_call</span><span class="p">(</span><span class="nx">fun</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="p">,</span><span class="w"> </span><span class="nx">destination</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">primitive_functions</span><span class="p">[</span><span class="nx">fun</span><span class="p">])</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">primitive_functions</span><span class="p">[</span><span class="nx">fun</span><span class="p">](</span><span class="nx">args</span><span class="p">,</span><span class="w"> </span><span class="nx">destination</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Save param registers</span> <span class="w"> </span><span class="nx">args</span><span class="p">.</span><span class="nx">map</span><span class="p">((</span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`PUSH </span><span class="si">${</span><span class="nx">PARAM_REGISTERS</span><span class="p">[</span><span class="nx">i</span><span class="p">]</span><span class="si">}</span><span class="sb">`</span><span class="p">));</span> <span class="w"> </span><span class="c1">// Compile registers and store as params</span> <span class="w"> </span><span class="nx">args</span><span class="p">.</span><span class="nx">map</span><span class="p">((</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="nx">compile_expression</span><span class="p">(</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">PARAM_REGISTERS</span><span class="p">[</span><span class="nx">i</span><span class="p">],</span><span class="w"> </span><span class="nx">scope</span><span class="p">));</span> <span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`CALL </span><span class="si">${</span><span class="nx">BUILTIN_FUNCTIONS</span><span class="p">[</span><span class="nx">fun</span><span class="p">]</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">scope</span><span class="p">[</span><span class="nx">fun</span><span class="p">]</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Restore param registers</span> <span class="w"> </span><span class="nx">args</span><span class="p">.</span><span class="nx">map</span><span class="p">((</span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`POP </span><span class="si">${</span><span class="nx">PARAM_REGISTERS</span><span class="p">[</span><span class="nx">args</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="p">]</span><span class="si">}</span><span class="sb">`</span><span class="p">));</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">destination</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">destination</span><span class="w"> </span><span class="o">!==</span><span class="w"> </span><span class="s1">&#39;RAX&#39;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`MOV </span><span class="si">${</span><span class="nx">destination</span><span class="si">}</span><span class="sb">, RAX`</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>Now we can begin thinking about <code>compile_define</code>. It takes <code>args</code> which will be a list of three elements containing the function's:</p> <ul> <li>name</li> <li>parameters</li> <li>and body</li> </ul> <p class="note"> It does not use destination because we're treating function definitions as statements for now and not as expressions. If we were treating it as an expression, we might store the address of the function in the destination register. We keep destination around to keep the primitive function signatures consistent. </p><p>Based on how we called functions before and how we defined the hard-coded <code>add</code> function, we know what a function definition in assembly generally looks like. And we know the arguments to the function when called will be in RDI, RSI, and RDX.</p> <div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">compile_define</span><span class="p">([</span><span class="nx">name</span><span class="p">,</span><span class="w"> </span><span class="nx">parameters</span><span class="p">,</span><span class="w"> </span><span class="nx">body</span><span class="p">])</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Function name becomes a label we can CALL</span> <span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="sb">`</span><span class="si">${</span><span class="nx">name</span><span class="si">}</span><span class="sb">:`</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Something to do with RDI, RSI, RDX and the parameters variable?</span> <span class="w"> </span><span class="c1">// We renamed compile_argument to compile_expression to be more general</span> <span class="w"> </span><span class="nx">compile_expression</span><span class="p">(</span><span class="nx">body</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span><span class="w"> </span><span class="s1">&#39;RAX&#39;</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Maybe some cleanup to do with RDI, RSI, RDX?</span> <span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;RET\n&#39;</span><span class="p">);</span> <span class="p">}</span> </pre></div> <p>Not a bad first sketch. But how do we match up <code>RDI</code>, <code>RSI</code>, <code>RDX</code> and the user-defined <code>parameters</code> variable names? For example in the following:</p> <div class="highlight"><pre><span></span><span class="p">(</span><span class="nv">def</span><span class="w"> </span><span class="nv">plus-two</span><span class="w"> </span><span class="p">(</span><span class="nv">a</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="nb">+</span><span class="w"> </span><span class="nv">a</span><span class="w"> </span><span class="mi">2</span><span class="p">))</span> </pre></div> <p>It's clear to us that <code>a</code> must match up to <code>RDI</code>. In order to do this we need to track all variables in a <code>scope</code> dictionary mapping the variable name to the register where it's stored.</p> <p>Additionally, keeping track of scope can help us fail quickly in the following scenario:</p> <div class="highlight"><pre><span></span><span class="p">(</span><span class="nv">def</span><span class="w"> </span><span class="nv">plus-two</span><span class="w"> </span><span class="p">(</span><span class="nv">a</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="nb">+</span><span class="w"> </span><span class="nv">b</span><span class="w"> </span><span class="mi">2</span><span class="p">))</span> </pre></div> <p>The variable <code>b</code> is used but never defined. It has not been added to the scope dictionary. So our compiler can fail quickly saying there is an undefined variable being referenced.</p> <p>Taking this a step further, what if we want to catch the following too:</p> <div class="highlight"><pre><span></span><span class="p">(</span><span class="nv">def</span><span class="w"> </span><span class="nv">plus-two</span><span class="w"> </span><span class="p">(</span><span class="nv">a</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="nv">plus</span><span class="w"> </span><span class="nv">a</span><span class="w"> </span><span class="mi">2</span><span class="p">))</span> </pre></div> <p>We're trying to call <code>plus</code> but it has not been defined. We should be able to fail quickly here too. But that means we're need to track the scope of function <strong>names</strong> in addition to variables. We'll choose to track function names and variable names in the same scope dictionary.</p> <p class="note"> This is the distinction between a lisp-1 and a lisp-2. We are a lisp-1 like Scheme because we have a single scope. Common Lisp is a lisp-2 because it stores function name scope separately from variable name scope. </p><h3 id="implementing-scope">Implementing scope</h3><p>We need to revise every compile function to accept a scope dictionary (specifically: <code>compile</code>, <code>compile_expression</code>, <code>compile_call</code>, and <code>compile_define</code>). If a variable is referenced, we need to look up it's location in the scope dictionary. If a variable is defined (e.g. a function name or a function parameter) we need to add a mapping to the scope dictionary.</p> <p>Modifying <code>compile_expression</code> is easiest:</p> <div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">compile_expression</span><span class="p">(</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">destination</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Is a nested function call, compile it</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nb">Array</span><span class="p">.</span><span class="nx">isArray</span><span class="p">(</span><span class="nx">arg</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">compile_call</span><span class="p">(</span><span class="nx">arg</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span><span class="w"> </span><span class="nx">arg</span><span class="p">.</span><span class="nx">slice</span><span class="p">(</span><span class="mf">1</span><span class="p">),</span><span class="w"> </span><span class="nx">destination</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">scope</span><span class="p">[</span><span class="nx">arg</span><span class="p">]</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nb">Number</span><span class="p">.</span><span class="nx">isInteger</span><span class="p">(</span><span class="nx">arg</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`MOV </span><span class="si">${</span><span class="nx">destination</span><span class="si">}</span><span class="sb">, </span><span class="si">${</span><span class="nx">scope</span><span class="p">[</span><span class="nx">arg</span><span class="p">]</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">arg</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">&#39;Attempt to reference undefined variable or unsupported literal: &#39;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">arg</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>Next we modify <code>compile_call</code>:</p> <div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">compile_call</span><span class="p">(</span><span class="nx">fun</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="p">,</span><span class="w"> </span><span class="nx">destination</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">primitive_functions</span><span class="p">[</span><span class="nx">fun</span><span class="p">])</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">primitive_functions</span><span class="p">[</span><span class="nx">fun</span><span class="p">](</span><span class="nx">args</span><span class="p">,</span><span class="w"> </span><span class="nx">destination</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Save param registers</span> <span class="w"> </span><span class="nx">args</span><span class="p">.</span><span class="nx">map</span><span class="p">((</span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`PUSH </span><span class="si">${</span><span class="nx">PARAM_REGISTERS</span><span class="p">[</span><span class="nx">i</span><span class="p">]</span><span class="si">}</span><span class="sb">`</span><span class="p">));</span> <span class="w"> </span><span class="c1">// Compile registers and store as params</span> <span class="w"> </span><span class="nx">args</span><span class="p">.</span><span class="nx">map</span><span class="p">((</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="nx">compile_expression</span><span class="p">(</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">PARAM_REGISTERS</span><span class="p">[</span><span class="nx">i</span><span class="p">],</span><span class="w"> </span><span class="nx">scope</span><span class="p">));</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">validFunction</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">BUILTIN_FUNCTIONS</span><span class="p">[</span><span class="nx">fun</span><span class="p">]</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">scope</span><span class="p">[</span><span class="nx">fun</span><span class="p">];</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">validFunction</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`CALL </span><span class="si">${</span><span class="nx">validFunction</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">throw</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="ne">Error</span><span class="p">(</span><span class="s1">&#39;Attempt to call undefined function: &#39;</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">fun</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Restore param registers</span> <span class="w"> </span><span class="nx">args</span><span class="p">.</span><span class="nx">map</span><span class="p">((</span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`POP </span><span class="si">${</span><span class="nx">PARAM_REGISTERS</span><span class="p">[</span><span class="nx">args</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="p">]</span><span class="si">}</span><span class="sb">`</span><span class="p">));</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">destination</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">destination</span><span class="w"> </span><span class="o">!==</span><span class="w"> </span><span class="s1">&#39;RAX&#39;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`MOV </span><span class="si">${</span><span class="nx">destination</span><span class="si">}</span><span class="sb">, RAX`</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> </pre></div> <p>And then <code>compile_define</code> where we modify scope for the first time:</p> <div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">compile_define</span><span class="p">([</span><span class="nx">name</span><span class="p">,</span><span class="w"> </span><span class="nx">params</span><span class="p">,</span><span class="w"> </span><span class="p">...</span><span class="nx">body</span><span class="p">],</span><span class="w"> </span><span class="nx">destination</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Add this function to outer scope</span> <span class="w"> </span><span class="nx">scope</span><span class="p">[</span><span class="nx">name</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">name</span><span class="p">.</span><span class="nx">replace</span><span class="p">(</span><span class="s1">&#39;-&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;_&#39;</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Copy outer scope so parameter mappings aren&#39;t exposed in outer scope.</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">childScope</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="p">...</span><span class="nx">scope</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="sb">`</span><span class="si">${</span><span class="nx">scope</span><span class="p">[</span><span class="nx">name</span><span class="p">]</span><span class="si">}</span><span class="sb">:`</span><span class="p">);</span> <span class="w"> </span><span class="nx">params</span><span class="p">.</span><span class="nx">forEach</span><span class="p">(</span><span class="kd">function</span><span class="w"> </span><span class="p">(</span><span class="nx">param</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">register</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">PARAM_REGISTERS</span><span class="p">[</span><span class="nx">i</span><span class="p">];</span> <span class="w"> </span><span class="c1">// Store parameter mapped to associated register</span> <span class="w"> </span><span class="nx">childScope</span><span class="p">[</span><span class="nx">param</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">register</span><span class="p">;</span> <span class="w"> </span><span class="p">});</span> <span class="w"> </span><span class="c1">// Pass childScope in for reference when body is compiled.</span> <span class="w"> </span><span class="nx">compile_expression</span><span class="p">(</span><span class="nx">body</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span><span class="w"> </span><span class="s1">&#39;RAX&#39;</span><span class="p">,</span><span class="w"> </span><span class="nx">childScope</span><span class="p">);</span> <span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;RET\n&#39;</span><span class="p">);</span> <span class="p">}</span> </pre></div> <p>And finally we need to modify the entrypoint <code>compile</code>:</p> <div class="highlight"><pre><span></span><span class="nx">module</span><span class="p">.</span><span class="nx">exports</span><span class="p">.</span><span class="nx">compile</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="p">(</span><span class="nx">ast</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">emit_prefix</span><span class="p">();</span> <span class="w"> </span><span class="c1">// Pass in new, empty scope mapping</span> <span class="w"> </span><span class="nx">compile_call</span><span class="p">(</span><span class="nx">ast</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span><span class="w"> </span><span class="nx">ast</span><span class="p">.</span><span class="nx">slice</span><span class="p">(</span><span class="mf">1</span><span class="p">),</span><span class="w"> </span><span class="s1">&#39;RAX&#39;</span><span class="p">,</span><span class="w"> </span><span class="p">{});</span> <span class="w"> </span><span class="nx">emit_postfix</span><span class="p">();</span> <span class="p">}</span> </pre></div> <p>And scope-wise we're pretty good!</p> <h3 id="function-calling-convention:-callee">Function calling convention: callee</h3><p>We currently have a problem that we're using parameters registers to store local variables that messes up with how we are storing parameters for function calls within the function itself.</p> <p>Ideally we could store function local variables (including the parameters when we get them) separately from how we store function call parameters within the function.</p> <p>Thankfully according to the calling convention we've followed, we're given a set of registers that are callee-preserved. Of them we'll use <code>RBX</code>, <code>RBP</code>, and <code>R12</code> in that order. This allows us to mess with so long as we store them and restore them within the function.</p> <p>Applying the same storing/restoring strategy to local variables as we did for parameters, we get:</p> <div class="highlight"><pre><span></span><span class="kd">const</span><span class="w"> </span><span class="nx">LOCAL_REGISTERS</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span> <span class="w"> </span><span class="s1">&#39;RBX&#39;</span><span class="p">,</span> <span class="w"> </span><span class="s1">&#39;RBP&#39;</span><span class="p">,</span> <span class="w"> </span><span class="s1">&#39;R12&#39;</span><span class="p">,</span> <span class="p">];</span> <span class="kd">function</span><span class="w"> </span><span class="nx">compile_define</span><span class="p">([</span><span class="nx">name</span><span class="p">,</span><span class="w"> </span><span class="nx">params</span><span class="p">,</span><span class="w"> </span><span class="p">...</span><span class="nx">body</span><span class="p">],</span><span class="w"> </span><span class="nx">destination</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Add this function to outer scope</span> <span class="w"> </span><span class="nx">scope</span><span class="p">[</span><span class="nx">name</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">name</span><span class="p">.</span><span class="nx">replace</span><span class="p">(</span><span class="s1">&#39;-&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;_&#39;</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Copy outer scope so parameter mappings aren&#39;t exposed in outer scope.</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">childScope</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="p">...</span><span class="nx">scope</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="sb">`</span><span class="si">${</span><span class="nx">scope</span><span class="p">[</span><span class="nx">name</span><span class="p">]</span><span class="si">}</span><span class="sb">:`</span><span class="p">);</span> <span class="w"> </span><span class="nx">params</span><span class="p">.</span><span class="nx">forEach</span><span class="p">(</span><span class="kd">function</span><span class="w"> </span><span class="p">(</span><span class="nx">param</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">register</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">PARAM_REGISTERS</span><span class="p">[</span><span class="nx">i</span><span class="p">];</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">local</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">LOCAL_REGISTERS</span><span class="p">[</span><span class="nx">i</span><span class="p">];</span> <span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`PUSH </span><span class="si">${</span><span class="nx">local</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span> <span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`MOV </span><span class="si">${</span><span class="nx">local</span><span class="si">}</span><span class="sb">, </span><span class="si">${</span><span class="nx">register</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Store parameter mapped to associated local</span> <span class="w"> </span><span class="nx">childScope</span><span class="p">[</span><span class="nx">param</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">local</span><span class="p">;</span> <span class="w"> </span><span class="p">});</span> <span class="w"> </span><span class="c1">// Pass childScope in for reference when body is compiled.</span> <span class="w"> </span><span class="nx">compile_expression</span><span class="p">(</span><span class="nx">body</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span><span class="w"> </span><span class="s1">&#39;RAX&#39;</span><span class="p">,</span><span class="w"> </span><span class="nx">childScope</span><span class="p">);</span> <span class="w"> </span><span class="nx">params</span><span class="p">.</span><span class="nx">forEach</span><span class="p">(</span><span class="kd">function</span><span class="w"> </span><span class="p">(</span><span class="nx">param</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Backwards first</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">local</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">LOCAL_REGISTERS</span><span class="p">[</span><span class="nx">params</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="p">];</span> <span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`POP </span><span class="si">${</span><span class="nx">local</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span> <span class="w"> </span><span class="p">});</span> <span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;RET\n&#39;</span><span class="p">);</span> <span class="p">}</span> </pre></div> <p>And we're set.</p> <h3 id="cleanup">Cleanup</h3><p>We've still got a few messes going on:</p> <ul> <li>emit_prefix wraps out entire body in <code>_main</code>, we're requiring our own <code>main</code> now</li> <li>emitting to stdout instead of to a file</li> <li>multiple function definitions is treated as nonsense</li> </ul> <p>Starting first, we rewrite <code>emit_prefix</code> and <code>emit_postfix</code> so that our <code>_main</code> just calls <code>main</code>.</p> <div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">emit_prefix</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;.global _main\n&#39;</span><span class="p">);</span> <span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;.text\n&#39;</span><span class="p">);</span> <span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;plus:&#39;</span><span class="p">);</span> <span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;ADD RDI, RSI&#39;</span><span class="p">);</span> <span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;MOV RAX, RDI&#39;</span><span class="p">);</span> <span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;RET\n&#39;</span><span class="p">);</span> <span class="p">}</span> <span class="kd">function</span><span class="w"> </span><span class="nx">emit_postfix</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;_main:&#39;</span><span class="p">);</span> <span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;CALL main&#39;</span><span class="p">);</span> <span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;MOV RDI, RAX&#39;</span><span class="p">);</span><span class="w"> </span><span class="c1">// Set exit arg</span> <span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`MOV RAX, </span><span class="si">${</span><span class="nx">SYSCALL_MAP</span><span class="p">[</span><span class="s1">&#39;exit&#39;</span><span class="p">]</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span> <span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;SYSCALL&#39;</span><span class="p">);</span> <span class="p">}</span> </pre></div> <p>Next to deal with writing to a file instead of stdout, we need our <code>emit</code> function to write to a buffer. We'll let <code>ulisp.js</code> write that buffer to a file. Because we're incredibly lazy, we'll do this all globally.</p> <div class="highlight"><pre><span></span><span class="kd">let</span><span class="w"> </span><span class="nx">OUT</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;&#39;</span><span class="p">;</span> <span class="kd">function</span><span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">indent</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="nb">Array</span><span class="p">(</span><span class="nx">depth</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mf">1</span><span class="p">).</span><span class="nx">join</span><span class="p">(</span><span class="s1">&#39; &#39;</span><span class="p">);</span> <span class="w"> </span><span class="nx">OUT</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="sb">`</span><span class="si">${</span><span class="nx">indent</span><span class="si">}${</span><span class="nx">args</span><span class="si">}</span><span class="sb">\n`</span><span class="p">;</span> <span class="p">}</span> <span class="p">...</span> <span class="nx">module</span><span class="p">.</span><span class="nx">exports</span><span class="p">.</span><span class="nx">compile</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="p">(</span><span class="nx">ast</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">OUT</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;&#39;</span><span class="p">;</span> <span class="w"> </span><span class="nx">emit_prefix</span><span class="p">();</span> <span class="w"> </span><span class="nx">compile_call</span><span class="p">(</span><span class="nx">ast</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span><span class="w"> </span><span class="nx">ast</span><span class="p">.</span><span class="nx">slice</span><span class="p">(</span><span class="mf">1</span><span class="p">),</span><span class="w"> </span><span class="s1">&#39;RAX&#39;</span><span class="p">,</span><span class="w"> </span><span class="p">{});</span> <span class="w"> </span><span class="nx">emit_postfix</span><span class="p">();</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">OUT</span><span class="p">;</span> <span class="p">}</span> </pre></div> <p>And modify <code>ulisp.js</code>:</p> <div class="highlight"><pre><span></span><span class="kd">const</span><span class="w"> </span><span class="nx">cp</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">&#39;child_process&#39;</span><span class="p">);</span> <span class="kd">const</span><span class="w"> </span><span class="nx">fs</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">&#39;fs&#39;</span><span class="p">);</span> <span class="kd">const</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">parse</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">&#39;./parser&#39;</span><span class="p">);</span> <span class="kd">const</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">compile</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">&#39;./compiler&#39;</span><span class="p">);</span> <span class="kd">function</span><span class="w"> </span><span class="nx">main</span><span class="w"> </span><span class="p">(</span><span class="nx">args</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">input</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">fs</span><span class="p">.</span><span class="nx">readFileSync</span><span class="p">(</span><span class="nx">args</span><span class="p">[</span><span class="mf">2</span><span class="p">]).</span><span class="nx">toString</span><span class="p">();</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="p">[</span><span class="nx">ast</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">parse</span><span class="p">(</span><span class="nx">input</span><span class="p">);</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">program</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">compile</span><span class="p">(</span><span class="nx">ast</span><span class="p">);</span> <span class="w"> </span><span class="k">try</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">fs</span><span class="p">.</span><span class="nx">mkdirSync</span><span class="p">(</span><span class="s1">&#39;build&#39;</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">catch</span><span class="w"> </span><span class="p">(</span><span class="nx">e</span><span class="p">)</span><span class="w"> </span><span class="p">{}</span> <span class="w"> </span><span class="nx">fs</span><span class="p">.</span><span class="nx">writeFileSync</span><span class="p">(</span><span class="s1">&#39;build/prog.s&#39;</span><span class="p">,</span><span class="w"> </span><span class="nx">program</span><span class="p">);</span> <span class="w"> </span><span class="nx">cp</span><span class="p">.</span><span class="nx">execSync</span><span class="p">(</span><span class="s1">&#39;gcc -mstackrealign -masm=intel -o build/a.out build/prog.s&#39;</span><span class="p">);</span> <span class="p">}</span> <span class="nx">main</span><span class="p">(</span><span class="nx">process</span><span class="p">.</span><span class="nx">argv</span><span class="p">);</span> </pre></div> <p>And we're finally ready to run a simple program.</p> <h3 id="a-program!">A program!</h3><div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>test.lisp <span class="o">(</span>def<span class="w"> </span>main<span class="w"> </span><span class="o">()</span><span class="w"> </span><span class="o">(</span>+<span class="w"> </span><span class="m">1</span><span class="w"> </span><span class="m">2</span><span class="o">))</span> $<span class="w"> </span>node<span class="w"> </span>ulisp.js<span class="w"> </span>test.lisp $<span class="w"> </span>./build/a.out $<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="nv">$?</span> <span class="m">3</span> </pre></div> <p>Hurray! Now let's try defining and calling a second function to validate parameter behavior.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>test.lisp <span class="o">(</span>def<span class="w"> </span>plus-two<span class="w"> </span><span class="o">(</span>a<span class="o">)</span> <span class="w"> </span><span class="o">(</span>+<span class="w"> </span>a<span class="w"> </span><span class="m">2</span><span class="o">))</span> <span class="o">(</span>def<span class="w"> </span>main<span class="w"> </span><span class="o">()</span> <span class="w"> </span><span class="o">(</span>plus-two<span class="w"> </span><span class="m">3</span><span class="o">))</span> $<span class="w"> </span>node<span class="w"> </span>ulisp.js<span class="w"> </span>test.lisp $<span class="w"> </span>./build/a.out ./compiler.js:106 <span class="w"> </span>throw<span class="w"> </span>new<span class="w"> </span>Error<span class="o">(</span><span class="s1">&#39;Attempt to call undefined function: &#39;</span><span class="w"> </span>+<span class="w"> </span>fun<span class="o">)</span><span class="p">;</span> <span class="w"> </span>^ Error:<span class="w"> </span>Attempt<span class="w"> </span>to<span class="w"> </span>call<span class="w"> </span>undefined<span class="w"> </span><span class="k">function</span>:<span class="w"> </span>p2 ... </pre></div> <p>We start getting some really weird errors. And the reason is because our compiler doesn't know how to deal with sibling S-expressions.</p> <p>So we'll introduce a new primitive function called <code>begin</code> that calls all it's sibling functions and returns the value of the last call. Then we'll wrap the program in an implicit <code>begin</code> so we don't need to.</p> <div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">compile_begin</span><span class="p">(</span><span class="nx">body</span><span class="p">,</span><span class="w"> </span><span class="nx">destination</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">body</span><span class="p">.</span><span class="nx">forEach</span><span class="p">((</span><span class="nx">expression</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="nx">compile_expression</span><span class="p">(</span><span class="nx">expression</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;RAX&#39;</span><span class="p">,</span><span class="w"> </span><span class="nx">scope</span><span class="p">));</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">destination</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="nx">destination</span><span class="w"> </span><span class="o">!==</span><span class="w"> </span><span class="s1">&#39;RAX&#39;</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`MOV </span><span class="si">${</span><span class="nx">destination</span><span class="si">}</span><span class="sb">, RAX`</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="p">}</span> <span class="kd">const</span><span class="w"> </span><span class="nx">primitive_functions</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">def</span><span class="o">:</span><span class="w"> </span><span class="nx">compile_define</span><span class="p">,</span> <span class="w"> </span><span class="nx">begin</span><span class="o">:</span><span class="w"> </span><span class="nx">compile_begin</span><span class="p">,</span> <span class="p">};</span> <span class="p">...</span> <span class="nx">module</span><span class="p">.</span><span class="nx">exports</span><span class="p">.</span><span class="nx">compile</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="p">(</span><span class="nx">ast</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">OUT</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;&#39;</span><span class="p">;</span> <span class="w"> </span><span class="nx">emit_prefix</span><span class="p">();</span> <span class="w"> </span><span class="nx">compile_call</span><span class="p">(</span><span class="s1">&#39;begin&#39;</span><span class="p">,</span><span class="w"> </span><span class="nx">ast</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;RAX&#39;</span><span class="p">,</span><span class="w"> </span><span class="p">{});</span> <span class="w"> </span><span class="nx">emit_postfix</span><span class="p">();</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">OUT</span><span class="p">;</span> <span class="p">}</span> </pre></div> <p>And we try our test program again. :)</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>test.lisp <span class="o">(</span>def<span class="w"> </span>plus-two<span class="w"> </span><span class="o">(</span>a<span class="o">)</span> <span class="w"> </span><span class="o">(</span>+<span class="w"> </span>a<span class="w"> </span><span class="m">2</span><span class="o">))</span> <span class="o">(</span>def<span class="w"> </span>main<span class="w"> </span><span class="o">()</span> <span class="w"> </span><span class="o">(</span>plus-two<span class="w"> </span><span class="m">3</span><span class="o">))</span> $<span class="w"> </span>node<span class="w"> </span>ulisp.js<span class="w"> </span>test.lisp $<span class="w"> </span>./build/a.out $<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="nv">$?</span> <span class="m">5</span> </pre></div> <p>And that's all there is to it! Stay tuned for the next post on conditionals and tail-call optimization.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Part two on compiler basics using JavaScript: user-defined functions and variables <a href="https://t.co/XOam67HO8h">https://t.co/XOam67HO8h</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1087103061590446083?ref_src=twsrc%5Etfw">January 20, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/compiler-basics-functions.htmlSun, 20 Jan 2019 00:00:00 +0000Make small changes and solve the problems you havehttp://notes.eatonphil.com/make-small-changes-and-solve-the-problems-you-have.html<p>Two frustrating things that can happen in an organization are 1) big changes and 2) changes that aren’t clearly associated with a known problem. It’s even worse in that order.</p> <p>These situations tend to happen when a problem remain unaddressed for too long. These situations tend to happen when there is not a strong enough emphasis on respect for all employees -- their experience, ideas, and feelings.</p> <p>I try to avoid these issues in teams I run by starting early with a problem statement. Specifically when there’s a problem I’d like to solve, I’ll mention it in our fortnightly team retro. If there’s general agreement a problem exists, we begin looking for the least invasive/least effort way to fix the problem. More on that later.</p> <p>If the problem is not well understand or widely-enough shared, I’ll table the discussion until I can talk with more people to better articulate the problem. Or maybe there isn’t a problem after all.</p> <p>This process of clarifying and agreeing a problem exists is the only appropriate first step when making a change. It is important to provide sufficient context to affected employees.</p> <p>After the problem is understood I begin to suggest possible solutions -- soliciting feedback and alternatives. But making sure a problem is well understand is not the same thing as making sure that potential solutions could reasonably solve the problem. Throughout the discussion of solutions I try to repeatedly make sure that proposed solutions could actually address the problem.</p> <p>From there I try to steer discussion of solutions to ones that are easiest to make and least invasive. Small changes are easier to make. There is little room for disagreement when there is little changing.</p> <p>Making small changes among a small group of people is even easier. The few disagreements that you find when making small changes among a small group of people give you a chance to prove or improve the solution before introducing it to a larger group.</p> <p>Communicating frequently and effectively should be a clear theme here.</p> <p>At this point if there is a single most reasonable solution, I’ll pick it unless there is serious disagreement. Most of the time folks are amenable to the need for a solution to be chosen to solve a problem they agreed existed, even if they don’t love the solution.</p> <p>If there is no clear solution or there is serious disagreement, go back a few paragraphs and start over to understand the problem and solicit feedback and alternative for solutions. Or take the heat of serious disagreement.</p> <p>This is a philosophy. It’s difficult to prove the effectiveness one way or the other -- especially over the mid-to-long-term. But the logic makes sense to me, it agrees with what I’ve read on management, and has worked effectively in teams I’ve run so far.</p> <p>Further reading:</p> <ul> <li><a href="https://amzn.to/2GHlro5">Peopleware: Productive Projects and Teams</a></li> <li><a href="https://amzn.to/2BGEysM">Managing Transitions: Making the Most of Change</a></li> <li><a href="https://amzn.to/2LA34Ar">Thinking, Fast and Slow</a></li> <li><a href="https://amzn.to/2LDfQOz">Site Reliability Engineering: How Google Runs Production Systems</a></li> </ul> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Wrote a post expanding on a side of this: make small changes and solve the problems you have <a href="https://t.co/FXepELSHMx">https://t.co/FXepELSHMx</a> <a href="https://t.co/mVsT1KFhKc">https://t.co/mVsT1KFhKc</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1078312937348059136?ref_src=twsrc%5Etfw">December 27, 2018</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/make-small-changes-and-solve-the-problems-you-have.htmlThu, 27 Dec 2018 00:00:00 +0000Writing a lisp compiler from scratch in JavaScript: 1. lisp to assemblyhttp://notes.eatonphil.com/compiler-basics-lisp-to-assembly.html<p class="note"> Next in compiler basics: <! forgive me, for I have sinned > <br /> <a href="/compiler-basics-functions.html">2. user-defined functions and variables</a> <br /> <a href="/compiler-basics-llvm.html">3. LLVM</a> <br /> <a href="/compiler-basics-llvm-conditionals.html">4. LLVM conditionals and compiling fibonacci</a> <br /> <a href="/compiler-basics-llvm-system-calls.html">5. LLVM system calls</a> <br /> <a href="/compiler-basics-an-x86-upgrade.html">6. an x86 upgrade</a> </p><p>In this post we'll write a simple compiler in Javascript (on Node) without any third-party libraries. Our goal is to take an input program like <code>(+ 1 (+ 2 3))</code> and produce an output assembly program that does these operations to produce <code>6</code> as the exit code. The resulting compiler can be found <a href="https://github.com/eatonphil/ulisp">here</a>.</p> <p>We'll cover:</p> <ul> <li>Parsing</li> <li>Code generation</li> <li>Assembly basics</li> <li>Syscalls</li> </ul> <p>And for now we'll omit:</p> <ul> <li>Programmable function definitions</li> <li>Non-symbol/-numeric data types</li> <li>More than 3 function arguments</li> <li>Lots of safety</li> <li>Lots of error messsages</li> </ul> <h3 id="parsing">Parsing</h3><p>We pick the <a href="https://en.wikipedia.org/wiki/S-expression">S-expression</a> syntax mentioned earlier because it's very easy to parse. Furthermore, our input language is so limited that we won't even break our parser into separate lexing/parsing stages.</p> <p class="note"> Once you need to support string literals, comments, decimal literals, and other more complex literals it becomes easier to use separate stages. <br /> <br /> If you're curious about these separate stages of parsing, you may be interested in my post on <a href="http://notes.eatonphil.com/writing-a-simple-json-parser.html">writing a JSON parser</a>. <br /> <br /> Or, check out my BSDScheme project for a fully-featured <a href="https://github.com/eatonphil/bsdscheme/blob/master/src/lex.d">lexer</a> and <a href="https://github.com/eatonphil/bsdscheme/blob/master/src/parse.d">parser</a> for Scheme. </p><p>The parser should produce an Abstract Syntax Tree (AST), a data structure representing the input program. Specifically, we want <code>(+ 1 (+ 2 3))</code> to produce <code>['+', 1, ['+', 2, 3]]</code> in Javascript.</p> <p>There are many different ways to go about parsing but the most intuitive to me is to have a function that accepts a program (a string) and returns a tuple containing the program parsed so far (an AST) and the rest of the program (a string) that hasn't been parsed.</p> <p>That leaves us with a function skeleton that looks like this:</p> <div class="highlight"><pre><span></span><span class="nx">module</span><span class="p">.</span><span class="nx">exports</span><span class="p">.</span><span class="nx">parse</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">parse</span><span class="p">(</span><span class="nx">program</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">tokens</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[];</span> <span class="w"> </span><span class="p">...</span><span class="w"> </span><span class="nx">logic</span><span class="w"> </span><span class="nx">to</span><span class="w"> </span><span class="nx">be</span><span class="w"> </span><span class="nx">added</span><span class="w"> </span><span class="p">...</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">[</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;&#39;</span><span class="p">];</span> <span class="p">};</span> </pre></div> <p>The code that initially calls parse will thus have to deal with unwrapping the outermost tuple to get to the AST. For a more helpful compiler we could check that the entire program <em>was</em> actually parsed by failing if the second element of the return result is not the empty string.</p> <p>Within the function we will iterate over each character and accumulate until we hit space, left or right parenthesis:</p> <div class="highlight"><pre><span></span><span class="nx">module</span><span class="p">.</span><span class="nx">exports</span><span class="p">.</span><span class="nx">parse</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">parse</span><span class="p">(</span><span class="nx">program</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">tokens</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[];</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">currentToken</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;&#39;</span><span class="p">;</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">let</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="nx">program</span><span class="p">.</span><span class="nx">length</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="kr">char</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">program</span><span class="p">.</span><span class="nx">charAt</span><span class="p">(</span><span class="nx">i</span><span class="p">);</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="kr">char</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">&#39;(&#39;</span><span class="o">:</span><span class="w"> </span><span class="c1">// TODO</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">&#39;)&#39;</span><span class="o">:</span><span class="w"> </span><span class="c1">// TODO</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">&#39; &#39;</span><span class="o">:</span> <span class="w"> </span><span class="nx">tokens</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="o">+</span><span class="nx">currentToken</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">currentToken</span><span class="p">);</span> <span class="w"> </span><span class="nx">currentToken</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;&#39;</span><span class="p">;</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="k">default</span><span class="o">:</span> <span class="w"> </span><span class="nx">currentToken</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="kr">char</span><span class="p">;</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">[</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;&#39;</span><span class="p">];</span> <span class="p">};</span> </pre></div> <p>The recursive parts are always the most challenging. The right paren is easiest. We must push the current token and return all tokens with the rest of the program:</p> <div class="highlight"><pre><span></span><span class="nx">module</span><span class="p">.</span><span class="nx">exports</span><span class="p">.</span><span class="nx">parse</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">parse</span><span class="p">(</span><span class="nx">program</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">tokens</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[];</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">currentToken</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;&#39;</span><span class="p">;</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">let</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="nx">program</span><span class="p">.</span><span class="nx">length</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="kr">char</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">program</span><span class="p">.</span><span class="nx">charAt</span><span class="p">(</span><span class="nx">i</span><span class="p">);</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="kr">char</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">&#39;(&#39;</span><span class="o">:</span><span class="w"> </span><span class="c1">// TODO</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">&#39;)&#39;</span><span class="o">:</span> <span class="w"> </span><span class="nx">tokens</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="o">+</span><span class="nx">currentToken</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">currentToken</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">[</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">program</span><span class="p">.</span><span class="nx">substring</span><span class="p">(</span><span class="nx">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mf">1</span><span class="p">)];</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">&#39; &#39;</span><span class="o">:</span> <span class="w"> </span><span class="nx">tokens</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="o">+</span><span class="nx">currentToken</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">currentToken</span><span class="p">);</span> <span class="w"> </span><span class="nx">currentToken</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;&#39;</span><span class="p">;</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="k">default</span><span class="o">:</span> <span class="w"> </span><span class="nx">currentToken</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="kr">char</span><span class="p">;</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">[</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;&#39;</span><span class="p">];</span> <span class="p">};</span> </pre></div> <p>Finally the left paren should recurse, add the parsed tokens to the list of sibling tokens, and force the loop to start at the new unparsed point.</p> <div class="highlight"><pre><span></span><span class="nx">module</span><span class="p">.</span><span class="nx">exports</span><span class="p">.</span><span class="nx">parse</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">parse</span><span class="p">(</span><span class="nx">program</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">tokens</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[];</span> <span class="w"> </span><span class="kd">let</span><span class="w"> </span><span class="nx">currentToken</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;&#39;</span><span class="p">;</span> <span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="kd">let</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">0</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">&lt;</span><span class="w"> </span><span class="nx">program</span><span class="p">.</span><span class="nx">length</span><span class="p">;</span><span class="w"> </span><span class="nx">i</span><span class="o">++</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="kr">char</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">program</span><span class="p">.</span><span class="nx">charAt</span><span class="p">(</span><span class="nx">i</span><span class="p">);</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="kr">char</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">&#39;(&#39;</span><span class="o">:</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="p">[</span><span class="nx">parsed</span><span class="p">,</span><span class="w"> </span><span class="nx">rest</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">parse</span><span class="p">(</span><span class="nx">program</span><span class="p">.</span><span class="nx">substring</span><span class="p">(</span><span class="nx">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mf">1</span><span class="p">));</span> <span class="w"> </span><span class="nx">tokens</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="nx">parsed</span><span class="p">);</span> <span class="w"> </span><span class="nx">program</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">rest</span><span class="p">;</span> <span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="mf">0</span><span class="p">;</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">&#39;)&#39;</span><span class="o">:</span> <span class="w"> </span><span class="nx">tokens</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="o">+</span><span class="nx">currentToken</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">currentToken</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">[</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="nx">program</span><span class="p">.</span><span class="nx">substring</span><span class="p">(</span><span class="nx">i</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mf">1</span><span class="p">)];</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="s1">&#39; &#39;</span><span class="o">:</span> <span class="w"> </span><span class="nx">tokens</span><span class="p">.</span><span class="nx">push</span><span class="p">(</span><span class="o">+</span><span class="nx">currentToken</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">currentToken</span><span class="p">);</span> <span class="w"> </span><span class="nx">currentToken</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s1">&#39;&#39;</span><span class="p">;</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="k">default</span><span class="o">:</span> <span class="w"> </span><span class="nx">currentToken</span><span class="w"> </span><span class="o">+=</span><span class="w"> </span><span class="kr">char</span><span class="p">;</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="p">[</span><span class="nx">tokens</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;&#39;</span><span class="p">];</span> <span class="p">};</span> </pre></div> <p>Assuming this is all in <code>parser.js</code>, let's try it out in the REPL:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>node &gt;<span class="w"> </span>const<span class="w"> </span><span class="o">{</span><span class="w"> </span>parse<span class="w"> </span><span class="o">}</span><span class="w"> </span><span class="o">=</span><span class="w"> </span>require<span class="o">(</span><span class="s1">&#39;./parser&#39;</span><span class="o">)</span><span class="p">;</span> undefined &gt;<span class="w"> </span>console.log<span class="o">(</span>JSON.stringify<span class="o">(</span>parse<span class="o">(</span><span class="s1">&#39;(+ 3 (+ 1 2)&#39;</span><span class="o">)))</span><span class="p">;</span> <span class="o">[[[</span><span class="s2">&quot;+&quot;</span>,3,<span class="o">[</span><span class="s2">&quot;+&quot;</span>,1,2<span class="o">]]]</span>,<span class="s2">&quot;&quot;</span><span class="o">]</span> </pre></div> <p>Solid. We move on.</p> <h3 id="assembly-101">Assembly 101</h3><p>Assembly is essentially the lowest-level programming language we can use. It is a human readable, 1:1 representation of the binary instructions the CPU can interpret. Conversion from assembly to binary is done with an assembler; the reverse step is done with a disassembler. We'll use <code>gcc</code> for assembling since it deals with some <a href="http://fabiensanglard.net/macosxassembly/index.php">oddities</a> of assembly programming on macOS.</p> <p>The primary data structures in assembly are registers (temporary variables stored by the CPU) and the program stack. Every function in a program has access to the same registers, but convention cordons off sections of the stack for each function so it ends up being a slightly more durable store than registers. <code>RAX</code>, <code>RDI</code>, <code>RDX</code>, and <code>RSI</code> are a few registers available to us.</p> <p>Now we only need to know a few instructions to compile our program (the rest of programming assembly is convention):</p> <ul> <li><code>MOV</code>: store one register's content into another, or store a literal number into a register</li> <li><code>ADD</code>: store the sum of two register's contents in the first register</li> <li><code>PUSH</code>: store a register's content on the stack</li> <li><code>POP</code>: remove the top-most value from the stack and store in a register</li> <li><code>CALL</code>: enter a new section of the stack and start running the function</li> <li><code>RET</code>: enter the calling functions stack and return to evaluating from the next instruction after the call</li> <li><code>SYSCALL</code>: like <code>CALL</code> but where the function is handled by the kernel</li> </ul> <h3 id="function-calling-convention">Function calling convention</h3><p>Assembly instructions are flexible enough that there is no language-defined way to make function calls. Therefore it is important to answer (at least) the following few questions:</p> <ul> <li>Where are parameters stored by the caller so that the callee has access to them?</li> <li>Where is the return value stored by the callee so the caller has access to it?</li> <li>What registers are saved by whom?</li> </ul> <p>Without getting too far into the specifics, we'll assume the following answers for development on x86_64 macOS and Linux systems:</p> <ul> <li>Parameters are stored (in order) in the <code>RDI</code>, <code>RSI</code>, and <code>RDX</code> registers<ul> <li>We won't support passing more than three arguments</li> </ul> </li> <li>The return value is stored in the <code>RAX</code> register</li> <li><code>RDI</code>, <code>RSI</code>, and <code>RDX</code> registers are stored by the caller</li> </ul> <h3 id="code-generation">Code generation</h3><p>With assembly basics and the function call convention in mind, we've got enough to generate code from the parsed program's AST.</p> <p>The skeleton of our compile code will look like this:</p> <div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="nx">depth</span><span class="p">,</span><span class="w"> </span><span class="nx">code</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">indent</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="ow">new</span><span class="w"> </span><span class="nb">Array</span><span class="p">(</span><span class="nx">depth</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="mf">1</span><span class="p">).</span><span class="nx">map</span><span class="p">(()</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="s1">&#39;&#39;</span><span class="p">).</span><span class="nx">join</span><span class="p">(</span><span class="s1">&#39; &#39;</span><span class="p">);</span> <span class="w"> </span><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">indent</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">code</span><span class="p">);</span> <span class="p">}</span> <span class="kd">function</span><span class="w"> </span><span class="nx">compile_argument</span><span class="p">(</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">destination</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// If arg AST is a list, call compile_call on it</span> <span class="w"> </span><span class="c1">// Else must be a literal number, store in destination register</span> <span class="p">}</span> <span class="kd">function</span><span class="w"> </span><span class="nx">compile_call</span><span class="p">(</span><span class="nx">fun</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="p">,</span><span class="w"> </span><span class="nx">destination</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Save param registers to the stack</span> <span class="w"> </span><span class="c1">// Compile arguments and store in param registers</span> <span class="w"> </span><span class="c1">// Call function</span> <span class="w"> </span><span class="c1">// Restore param registers from the stack</span> <span class="w"> </span><span class="c1">// Move result into destination if provided</span> <span class="p">}</span> <span class="kd">function</span><span class="w"> </span><span class="nx">emit_prefix</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Assembly prefix</span> <span class="p">}</span> <span class="kd">function</span><span class="w"> </span><span class="nx">emit_postfix</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Assembly postfix</span> <span class="p">}</span> <span class="nx">module</span><span class="p">.</span><span class="nx">exports</span><span class="p">.</span><span class="nx">compile</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kd">function</span><span class="w"> </span><span class="nx">parse</span><span class="p">(</span><span class="nx">ast</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">emit_prefix</span><span class="p">();</span> <span class="w"> </span><span class="nx">compile_call</span><span class="p">(</span><span class="nx">ast</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span><span class="w"> </span><span class="nx">ast</span><span class="p">.</span><span class="nx">slice</span><span class="p">(</span><span class="mf">1</span><span class="p">));</span> <span class="w"> </span><span class="nx">emit_postfix</span><span class="p">();</span> <span class="p">};</span> </pre></div> <p>From our pseudo-code in comments it is simple enough to fill in. Let's fill in everything but the prefix and postfix code.</p> <div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">compile_argument</span><span class="p">(</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">destination</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// If arg AST is a list, call compile_call on it</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nb">Array</span><span class="p">.</span><span class="nx">isArray</span><span class="p">(</span><span class="nx">arg</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">compile_call</span><span class="p">(</span><span class="nx">arg</span><span class="p">[</span><span class="mf">0</span><span class="p">],</span><span class="w"> </span><span class="nx">arg</span><span class="p">.</span><span class="nx">slice</span><span class="p">(</span><span class="mf">1</span><span class="p">),</span><span class="w"> </span><span class="nx">destination</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="c1">// Else must be a literal number, store in destination register</span> <span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`MOV </span><span class="si">${</span><span class="nx">destination</span><span class="si">}</span><span class="sb">, </span><span class="si">${</span><span class="nx">arg</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span> <span class="p">}</span> <span class="kd">const</span><span class="w"> </span><span class="nx">BUILTIN_FUNCTIONS</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="s1">&#39;+&#39;</span><span class="o">:</span><span class="w"> </span><span class="s1">&#39;plus&#39;</span><span class="w"> </span><span class="p">};</span> <span class="kd">const</span><span class="w"> </span><span class="nx">PARAM_REGISTERS</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">[</span><span class="s1">&#39;RDI&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;RSI&#39;</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;RDX&#39;</span><span class="p">];</span> <span class="kd">function</span><span class="w"> </span><span class="nx">compile_call</span><span class="p">(</span><span class="nx">fun</span><span class="p">,</span><span class="w"> </span><span class="nx">args</span><span class="p">,</span><span class="w"> </span><span class="nx">destination</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="c1">// Save param registers to the stack</span> <span class="w"> </span><span class="nx">args</span><span class="p">.</span><span class="nx">forEach</span><span class="p">((</span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`PUSH </span><span class="si">${</span><span class="nx">PARAM_REGISTERS</span><span class="p">[</span><span class="nx">i</span><span class="p">]</span><span class="si">}</span><span class="sb">`</span><span class="p">));</span> <span class="w"> </span><span class="c1">// Compile arguments and store in param registers</span> <span class="w"> </span><span class="nx">args</span><span class="p">.</span><span class="nx">forEach</span><span class="p">((</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="nx">compile_argument</span><span class="p">(</span><span class="nx">arg</span><span class="p">,</span><span class="w"> </span><span class="nx">PARAM_REGISTERS</span><span class="p">[</span><span class="nx">i</span><span class="p">]));</span> <span class="w"> </span><span class="c1">// Call function</span> <span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`CALL </span><span class="si">${</span><span class="nx">BUILTIN_FUNCTIONS</span><span class="p">[</span><span class="nx">fun</span><span class="p">]</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="nx">fun</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span> <span class="w"> </span><span class="c1">// Restore param registers from the stack</span> <span class="w"> </span><span class="nx">args</span><span class="p">.</span><span class="nx">forEach</span><span class="p">((</span><span class="nx">_</span><span class="p">,</span><span class="w"> </span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">=&gt;</span><span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`POP </span><span class="si">${</span><span class="nx">PARAM_REGISTERS</span><span class="p">[</span><span class="nx">args</span><span class="p">.</span><span class="nx">length</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="nx">i</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="p">]</span><span class="si">}</span><span class="sb">`</span><span class="p">));</span> <span class="w"> </span><span class="c1">// Move result into destination if provided</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">destination</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`MOV </span><span class="si">${</span><span class="nx">destination</span><span class="si">}</span><span class="sb">, RAX`</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;&#39;</span><span class="p">);</span><span class="w"> </span><span class="c1">// For nice formatting</span> <span class="p">}</span> </pre></div> <p>In a better compiler, we would not make <code>plus</code> a built-in function. We'd emit code for the assembly instruction <code>ADD</code>. However, making <code>plus</code> a function makes code generation simpler and also allows us to see what function calls look like.</p> <p>We'll define the <code>plus</code> built-in function in the prefix code.</p> <h3 id="the-prefix">The prefix</h3><p>Assembly programs consist of a few "sections" in memory. The most important of which are the <code>text</code> and <code>data</code> sections. <code>text</code> is a read-only section where the program instructions themselves are stored. The CPU is instructed to start interpreting from some location in this text section and it will increment through instructions, evaluating each instruction until it reaches an instruction that tells it to jump to a different location to evaluate instructions (e.g. with CALL, RET, or JMP).</p> <p>To denote the text section we emit <code>.text</code> in our prefix before we emit our generated code.</p> <p class="note"> The data section is for statically initialized values (e.g. global variables). We don't have any need for that right now so we'll ignore it. <br /> <br /> <a href="https://www.cs.bgu.ac.il/~caspl122/wiki.files/lab2/ch07lev1sec6/ch07lev1sec6.html">Here</a> is a good read with more detail on these (and other) sections. </p><p>Additionally, we need to emit an entrypoint (we'll use <code>_main</code>) and add a notice (<code>.global _main</code>) so that the location of this entrypoint is visible externally. This is important because we let <code>gcc</code> handle the hairier parts of generating an executable file and it needs access to the entrypoint.</p> <p>So far, our prefix looks like this:</p> <div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">emit_prefix</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;.global _main\n&#39;</span><span class="p">);</span> <span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;.text\n&#39;</span><span class="p">);</span> <span class="w"> </span><span class="c1">// TODO: add built-in functions</span> <span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;_main:&#39;</span><span class="p">);</span> <span class="p">}</span> </pre></div> <p>The last part of our prefix needs to include the <code>plus</code> built-in function. For this, we add the first two parameter registers we agreed on (<code>RDI</code> and <code>RSI</code>) and store the result in <code>RAX</code>.</p> <div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">emit_prefix</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;.global _main\n&#39;</span><span class="p">);</span> <span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;.text\n&#39;</span><span class="p">);</span> <span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;plus:&#39;</span><span class="p">);</span> <span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;ADD RDI, RSI&#39;</span><span class="p">);</span> <span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;MOV RAX, RDI&#39;</span><span class="p">);</span> <span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;RET\n&#39;</span><span class="p">);</span> <span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;_main:&#39;</span><span class="p">);</span> <span class="p">}</span> </pre></div> <p>And we're golden.</p> <h3 id="the-postfix">The postfix</h3><p>The job of the postfix will be simple, call <code>exit</code> with the value of <code>RAX</code> since this will be the result of the last function called by the program.</p> <p><code>exit</code> is a syscall, so we'll use the <code>SYSCALL</code> instruction to call it. The x86_64 calling convention on macOS and Linux for <code>SYSCALL</code> defines parameters the same way <code>CALL</code> does. But we also need to tell <code>SYSCALL</code> what syscall to call. The convention is to set <code>RAX</code> to the integer representing the syscall on the current system. On Linux it will be <code>60</code>; on macOS it is <code>0x2000001</code>.</p> <p class="note"> When I say "convention", I don't mean that you really have a choice as a programmer. It was arbitrary when the operating system and standard libraries chose it. But if you want to write a working program that uses syscalls or calls out to (say) glibc, you'll need to follow these conventions. </p><p>The postfix then looks like this:</p> <div class="highlight"><pre><span></span><span class="kd">const</span><span class="w"> </span><span class="nx">os</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">&#39;os&#39;</span><span class="p">);</span> <span class="kd">const</span><span class="w"> </span><span class="nx">SYSCALL_MAP</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">os</span><span class="p">.</span><span class="nx">platform</span><span class="p">()</span><span class="w"> </span><span class="o">===</span><span class="w"> </span><span class="s1">&#39;darwin&#39;</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="s1">&#39;exit&#39;</span><span class="o">:</span><span class="w"> </span><span class="s1">&#39;0x2000001&#39;</span><span class="p">,</span> <span class="p">}</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="s1">&#39;exit&#39;</span><span class="o">:</span><span class="w"> </span><span class="mf">60</span><span class="p">,</span> <span class="p">};</span> <span class="kd">function</span><span class="w"> </span><span class="nx">emit_postfix</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;MOV RDI, RAX&#39;</span><span class="p">);</span><span class="w"> </span><span class="c1">// Set exit arg</span> <span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="sb">`MOV RAX, </span><span class="si">${</span><span class="nx">SYSCALL_MAP</span><span class="p">[</span><span class="s1">&#39;exit&#39;</span><span class="p">]</span><span class="si">}</span><span class="sb">`</span><span class="p">);</span><span class="w"> </span><span class="c1">// Set syscall number</span> <span class="w"> </span><span class="nx">emit</span><span class="p">(</span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="s1">&#39;SYSCALL&#39;</span><span class="p">);</span> <span class="p">}</span> </pre></div> <p>And we're set here too.</p> <h3 id="putting-it-all-together">Putting it all together</h3><p>We can finally write our Javascript entrypoint and run our compiler against a sample program.</p> <p>The entrypoint might look like this:</p> <div class="highlight"><pre><span></span><span class="kd">const</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">parse</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">&#39;./parser&#39;</span><span class="p">);</span> <span class="kd">const</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="nx">compile</span><span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">require</span><span class="p">(</span><span class="s1">&#39;./compiler&#39;</span><span class="p">);</span> <span class="kd">function</span><span class="w"> </span><span class="nx">main</span><span class="p">(</span><span class="nx">args</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="nx">script</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">args</span><span class="p">[</span><span class="mf">2</span><span class="p">];</span> <span class="w"> </span><span class="kd">const</span><span class="w"> </span><span class="p">[</span><span class="nx">ast</span><span class="p">]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nx">parse</span><span class="p">(</span><span class="nx">script</span><span class="p">);</span> <span class="w"> </span><span class="nx">compile</span><span class="p">(</span><span class="nx">ast</span><span class="p">[</span><span class="mf">0</span><span class="p">]);</span> <span class="p">}</span> <span class="nx">main</span><span class="p">(</span><span class="nx">process</span><span class="p">.</span><span class="nx">argv</span><span class="p">);</span> </pre></div> <p>And we can call it like so:</p> <div class="highlight"><pre><span></span><span class="nf">$</span><span class="w"> </span><span class="nv">node</span><span class="w"> </span><span class="nv">ulisp.js</span><span class="w"> </span><span class="s">&#39;(+ 3 (+ 2 1))&#39;</span> <span class="w"> </span><span class="nf">.global</span><span class="w"> </span><span class="nv">_main</span> <span class="w"> </span><span class="nf">.text</span> <span class="nl">plus:</span> <span class="w"> </span><span class="nf">ADD</span><span class="w"> </span><span class="nb">RDI</span><span class="p">,</span><span class="w"> </span><span class="nb">RSI</span> <span class="w"> </span><span class="nf">MOV</span><span class="w"> </span><span class="nb">RAX</span><span class="p">,</span><span class="w"> </span><span class="nb">RDI</span> <span class="w"> </span><span class="nf">RET</span> <span class="nl">_main:</span> <span class="w"> </span><span class="nf">PUSH</span><span class="w"> </span><span class="nb">RDI</span> <span class="w"> </span><span class="nf">PUSH</span><span class="w"> </span><span class="nb">RSI</span> <span class="w"> </span><span class="nf">MOV</span><span class="w"> </span><span class="nb">RDI</span><span class="p">,</span><span class="w"> </span><span class="mi">3</span> <span class="w"> </span><span class="nf">PUSH</span><span class="w"> </span><span class="nb">RDI</span> <span class="w"> </span><span class="nf">PUSH</span><span class="w"> </span><span class="nb">RSI</span> <span class="w"> </span><span class="nf">MOV</span><span class="w"> </span><span class="nb">RDI</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span> <span class="w"> </span><span class="nf">MOV</span><span class="w"> </span><span class="nb">RSI</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span> <span class="w"> </span><span class="nf">CALL</span><span class="w"> </span><span class="nv">plus</span> <span class="w"> </span><span class="nf">POP</span><span class="w"> </span><span class="nb">RSI</span> <span class="w"> </span><span class="nf">POP</span><span class="w"> </span><span class="nb">RDI</span> <span class="w"> </span><span class="nf">MOV</span><span class="w"> </span><span class="nb">RSI</span><span class="p">,</span><span class="w"> </span><span class="nb">RAX</span> <span class="w"> </span><span class="nf">CALL</span><span class="w"> </span><span class="nv">plus</span> <span class="w"> </span><span class="nf">POP</span><span class="w"> </span><span class="nb">RSI</span> <span class="w"> </span><span class="nf">POP</span><span class="w"> </span><span class="nb">RDI</span> <span class="w"> </span><span class="nf">MOV</span><span class="w"> </span><span class="nb">RDI</span><span class="p">,</span><span class="w"> </span><span class="nb">RAX</span> <span class="w"> </span><span class="nf">MOV</span><span class="w"> </span><span class="nb">RAX</span><span class="p">,</span><span class="w"> </span><span class="mh">0x2000001</span> <span class="w"> </span><span class="nf">SYSCALL</span> </pre></div> <h3 id="generating-an-executable-file-from-the-output">Generating an executable file from the output</h3><p>If we redirect the previous output to an assembly file and call <code>gcc</code> on it, we can generate a program we can run. Then we can echo the <code>$?</code> variable to see the exit code of the previous process.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>node<span class="w"> </span>ulisp.js<span class="w"> </span><span class="s1">&#39;(+ 3 (+ 2 1))&#39;</span><span class="w"> </span>&gt;<span class="w"> </span>program.S $<span class="w"> </span>gcc<span class="w"> </span>-mstackrealign<span class="w"> </span>-masm<span class="o">=</span>intel<span class="w"> </span>-o<span class="w"> </span>program<span class="w"> </span>program.s $<span class="w"> </span>./program $<span class="w"> </span><span class="nb">echo</span><span class="w"> </span><span class="nv">$?</span> <span class="m">6</span> </pre></div> <p>And we've got a working compiler! The full source of the compiler is available <a href="https://github.com/eatonphil/ulisp">here</a>.</p> <h3 id="further-reading">Further reading</h3><ul> <li><a href="https://aaronbloomfield.github.io/pdr/book/x86-64bit-ccc-chapter.pdf">x86_64 calling convention</a></li> <li>macOS assembly programming<ul> <li><a href="http://fabiensanglard.net/macosxassembly/index.php">Stack alignment on macOS</a></li> <li><a href="https://filippo.io/making-system-calls-from-assembly-in-mac-os-x/">Syscalls on macOS</a></li> </ul> </li> <li>Destination-driven code generation<ul> <li><a href="https://www.cs.indiana.edu/~dyb/pubs/ddcg.pdf">Kent Dybvig's original paper</a></li> <li><a href="http://cs.au.dk/~mis/dOvs/slides/46b-codegeneration-in-V8.pdf">One-pass code generation in V8</a></li> </ul> </li> </ul> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Finished that intro to compilers post :) lisp to assembly in Javascript <a href="https://t.co/0HDIn4Mv7a">https://t.co/0HDIn4Mv7a</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1066863077000441856?ref_src=twsrc%5Etfw">November 26, 2018</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/compiler-basics-lisp-to-assembly.htmlTue, 20 Nov 2018 00:00:00 +0000On NYC, Tokyo and Seoulhttp://notes.eatonphil.com/on-nyc-tokyo-and-seoul.html<p>I’ve lived in NYC for the past year — moved here after years in Philly and after growing up in a rural community a few hours west of there. My wife is South Korean and last week concluded my second trip to the suburbs of Seoul to visit her family. We finished up that trip with a week in Tokyo.</p> <p>Long a mecha and Godzilla fan, I was struck by a city not significantly more modern, or significantly more “Eastern”, than NYC. In contrast, the lesser known Seoul is more modern than both cities and shares as much “Eastern” vibe as Tokyo.</p> <p>I’d go so far as to say that Seoul is the most livable of the three for anyone of a similar background. There are a few concrete areas that led me to this including transportation, apartments, WiFi/cafes, food, and language.</p> <p>I'll conclude with a few tourist recommendations and a list of books to read on South Korea and Japan if you share my enthusiasm for comparing life in different cities.</p> <h3 id="transportation">Transportation</h3><p>NYC is one of the few cities in the world with a subway that runs 24/7. Tokyo and Seoul do not share this trait despite being many decades newer. (Tokyo and Seoul were heavily damaged during World War II and the Korean War, respectively.) And despite being built later, Tokyo subway cars are even less wide than NYC subway cars (~8.2ft vs. ~8.5ft).</p> <p>In contrast, Seoul subway cars are ~10.2ft wide. The difference may seem slight but it is noticeable during rush hour when in Seoul there is space for four people to stand in the aisle versus room for perhaps two in a Tokyo or NYC subway car.</p> <p><img src="https://photos.travelblog.org/Photos/10223/428861/f/4174039-Seoul-subway-car-0.jpg" alt="Seoul subway car" /> <small>Seoul subway car, source: Travel Blog</small></p> <p>The Seoul subway system is also the most advanced in terms of safety. All stations have a floor-to-ceiling barrier with doors that only open when a train arrives. Most stations in Tokyo have a ~3ft tall barrier that does the same, though some stations have no barrier. In NYC there are no barriers anywhere.</p> <p>Concerning innovation, Seoul and Tokyo both have multiple driverless subway lines whereas NYC has none. But in terms of complexity the NYC subway is the simplest because you pay only once. Seoul and Tokyo subways are slightly more complex in that you swipe your card when you enter and exit (or transfer).</p> <h4 id="taxis">Taxis</h4><p>It was jarring to be greeted by the very 90s, vaguely British Toyota Crown taxi cabs that dominate the streets of Tokyo.</p> <p><img src="https://i.imgur.com/WuIHqxY_d.jpg?maxwidth=640&shape=thumb&fidelity=medium" alt="Toyota Crown cab" /> <small>Source: Phil Eaton</small></p> <p>These cabs have no integrated navigation unit but a modern unit was typically mechanically attached. We saw a few of the recently approved Toyota JPN Taxi, but they only account for around <a href="https://www.japantimes.co.jp/news/2018/05/23/business/taxi-tokyo-prepares-olympic-tourism-boom-accessible-cabs-international-drivers/">10 percent</a> of cabs. (The integrated navigation is massive, perhaps 10-inch screens.) In contrast, Seoul has a <a href="http://travel.cnn.com/seoul/life/seoul-taxi-guide-783378/">variety</a> of modern cabs all with integrated navigation — the most common of which is the Hyundai Sonata.</p> <p><img src="http://www.theseoulguide.com/wp-content/uploads/2013/09/regular_orange_taxi_in_seoul.jpg" alt="Hyundai Sonata cab" /> <small>Source: The Seoul Guide</small></p> <p>Although Japanese car companies <a href="https://www.motortrend.com/news/12q2-1993-eunos-mazda-cosmo-drive/">pioneered</a> integrated navigation in the 90s, it appears to have been the standard for South Korean car companies for the past 10-20 years.</p> <p>And then there’s NYC with its primary mix of Crown Victorias and Priuses with multiple 4-inch smartphones mechanically attached for navigation.</p> <p><img src="https://thenypost.files.wordpress.com/2013/10/cab2.jpg?quality=90&strip=all" /> <small>Source: New York Post</small></p> <h3 id="living">Living</h3><p>South Korea has no concept of the suburb oriented around single-family houses. Drive an hour or two out from Seoul or Busan and see the same massive, modern apartment complexes that are found in the city center. After that it's the stark farms of Kansas. Japan appears more like the US in that the city graduates steadily to suburb and farm.</p> <p><img src="https://cdn.japantimes.2xx.jp/wp-content/uploads/2013/09/wn20130918n2a-870x580.jpg" alt="Apartments in Seoul" /> <small>Apartments in Seoul, source: Japan Times</small></p> <p>In general, buildings in South Korea are fairly homogeneous. Even the downtown areas of Seoul have little architectural creativity. Tokyo and NYC are both diverse in building styles and sizes. However, NYC takes the cake for ubiquity of massive towers. In fact, the first time my South Korean father-in-law visited Manhattan he was blown away by this mass.</p> <p><img src="https://upload.wikimedia.org/wikipedia/commons/thumb/8/8e/Manhattan_Skyline_night.jpg/800px-Manhattan_Skyline_night.jpg" alt="Manhattan skyline"> <small>New York City, source: Wikipedia</small></p> <p>The most popular neighborhoods in Tokyo seem more developed than their Seoul counterparts, the mass of stores and crowds extends further. And while the average age of buildings in Tokyo seems younger than the average age of buildings throughout Seoul (including less desirable areas), the developed areas (including buildings and streets) of Seoul are significantly cleaner and more modern. In contrast, and on average, Tokyo buildings seem as old as NYC buildings.</p> <p><img src="https://cdn.fodors.com/wp-content/uploads/2018/02/Tokyo-Neighborhoods-Along-Arakawa-Streetcar-1.jpg" /> <small>Tokyo, source: Fodors</small></p> <h4 id="air-quality">Air quality</h4><p>Air quality in <a href="https://www.numbeo.com/pollution/in/New-York">NYC</a> and <a href="https://www.numbeo.com/pollution/in/Tokyo">Tokyo</a> is high, pollution is low. But in <a href="https://www.upi.com/Fine-dust-levels-soar-in-South-Korea/5581523776231/">recent times</a>, air quality in Seoul has deteriorated with dangerous levels of fine dust from factories in South Korea and China. It is not clear when or how the South Korean government will address this.</p> <h3 id="wifi/cafes">WiFi/Cafes</h3><p>My idea of a good cafe is a decent ratio of seats to traffic, available electrical outlets, and decent WiFi. NYC and Tokyo have some similarities: chain coffee shops are larger and non-chains are often pretty small. Tokyo differs from NYC in that there are few electrical outlets and in the existence of interior smoking sections. (Tokyo bans smoking while walking but designates areas like parks or inner rooms in restaurants or cafes.)</p> <p>But the WiFi in Tokyo is abysmal. Many cafes do not have it (though the trend is to provide) and even the chains that do provide it have terrible speeds reaching peaks of 5Mbps down. In NYC WiFi is available near ~20Mbps down at most chains and ~5Mpbs at smaller non-chains.</p> <p>In contrast, South Korea is the jewel of cafe culture. Unlike how in the US coffee shop size decreases as population increases, coffee shop sizes in South Korea are oddly enormous everywhere. South Korea is rich with local shops, domestic chains (including the exported Paris Baguette and Tous Les Jours), and foreign chains (South Korea has the highest number of Starbucks Reserve stores per capita of any country).</p> <p><img src="https://file.mk.co.kr/meet/neds/2018/06/image_readtop_2018_402044_15299876083365412.jpg" alt="" /> <small>Starbucks Reserve in Seoul, source: Pulse News</small></p> <p>From Jeju Island to Seoul we never worried about a seat or an outlet at a cafe. Furthermore, the WiFi in South Korea is incredible. My tech-hopeless in-law’s basic internet plan got 80Mbps down and the small cafes near their apartment got at least 40Mbps down.</p> <p>NYC falls closer to Seoul in terms of ubiquity and speed of WiFi and has the added benefit of fast city-provided, outdoor WiFi surprisingly fast and available throughout the city. NYC is much worse in terms of daylight. Most cafes close between 8-10pm whereas cafes in Seoul and Tokyo easily stay open past 11pm.</p> <h4 id="caveat">Caveat</h4><p>It’s not exactly fair to exclude internet cafes, prevalent in both Seoul and Tokyo (oddly even NYC has a <a href="https://www.google.com/maps?q=nyc+internet+cafe&amp;um=1&amp;ie=UTF-8&amp;sa=X&amp;ved=0ahUKEwjE0-PxuZTeAhWTdXAKHYDFB1cQ_AUIDigB">few</a>). At an internet cafe in Tokyo you can expect abundant outlets and excellent WiFi (I saw peaks of 40Mbps down). I did not visit an internet cafe in Seoul but I expect it to be similar. In both Seoul and Tokyo you can easily find 24/7 service (with showers!?).</p> <p>I did not include internet cafes above because I find them slightly less convenient for tourists. Though credit is due: unlike American Chinatown internet cafes, the ones we visited in Tokyo were very clean, spacious and warm.</p> <p><img src="http://rakutama.com/en/images/shop/koriyama.jpg" alt="Internet cafe in Shinjuku" /> <small>Internet cafe in Shinjuku, source: Rakutama</small></p> <h3 id="food">Food</h3><p>Dining out in NYC is similar in cost to other major US cities. The quality is usually pretty good. Tokyo was about as expensive as food in NYC and generally as high quality. For instance, most dinners in NYC and Tokyo cost about $40-60 for two people. In contrast, most entrees in Seoul are sold for two and the dinner in total was often about $20-40. Restaurants on average seemed to be lower quality in Seoul compared to Tokyo and New York, but there are still more than enough high quality options.</p> <h3 id="language">Language</h3><p>I am biased having a better knowledge of Korean than Japanese and a South Korean partner to fall back on. But I believe South Korea is the more friendly place for an English speaker in that it is more dedicated to providing English translations and that the written language is simpler. In both cities the penetration of English-speaking natives (and quality of speech and comprehension) is indistinguishable and decent.</p> <p>To the first point, even the oddest locations and obscure signage had English translations in South Korea (not just Seoul) — not so even within Tokyo.</p> <p>To the second point, Japanese has three writing systems (kanji, hiragana, and katakana). Kanji (characters originating from Chinese) cannot be replaced in writing by phonetic counterparts in hiragana or katakana. So you have little choice but to memorize all important characters, disregarding the fact that many characters can be broken down. Then you must also memorize the alphabetic systems of hiragana and katakana.</p> <p>In contrast, Korean has two writing systems (hangul and hanja) where hanja (characters originating from Chinese) is primarily used in formal settings (government forms, academic books, etc.) and can be replaced with the phonetic equivalent in hangul.</p> <p>This makes it much simpler to memorize and read Korean compared to Japanese.</p> <h3 id="assorted-recommendations">Assorted recommendations</h3><p>For New Yorkers, don’t stay in the recommended areas of Shinjuku/Shibuya/Roppongi unless you’re the type who’d enjoy staying around Times Square. These three areas of Tokyo are just as obnoxious albeit much safer. I also don’t recommend the Harajuku area; it is extra. There’s no real equivalent level of crazy in Seoul although Hongdae comes close.</p> <p>In a future Tokyo trip I’d stick to the Meguro Station area including Ebisu and Daikanyama. They are beautiful, quiet neighborhoods with lots of restaurants and cafes beside the Meguro river. Areas along the Sumida River are also beautiful and quiet. Ginza/Tokyo Station is also a fun-but-not-obnoxious area to visit.</p> <p><img src="https://odis.homeaway.com/odis/listing/f3fd8dfd-c29e-4ab3-a0cd-19a99bdc3c7f.c10.jpg" alt="Ebisu"> <small>Ebisu, source: Homeaway</small></p> <p>I cannot recommend the Edo-Tokyo Museum enough, it is the best city museum I've visited. Tsukiji is also a must see, reminding me how much I miss going to Reading Terminal Market each weekend in Philly.</p> <p>In Seoul I’d recommend Yeonnam-Dong, Itaewon (which is much nicer than it’s made out to be), and Gwanghwamun. Mapo-Gu in general is a great region of Gangbuk as is the area below it (near Yeouido) in Gangnam.</p> <p><img src="https://i.imgur.com/ttdg5Y7.jpg?maxwidth=640" alt="Yeonnam-dong" /> <small>Yeonnam-Dong, source: Phil Eaton</small></p> <p>I recommend visiting the National Museum of Korea in Seoul as well as Hangang Park and Gyeongui Line Forest Park. The areas around the Tancheon stream flowing South to Bundang are also beautiful.</p> <p><img src="https://misadventuresofanawkwardamerican.files.wordpress.com/2014/05/dscn05912.jpg" alt="Tancheon near Bundang" /> <small>Tancheon near Bundang, source: Misadventures of an Awkward American</small></p> <h3 id="conclusion">Conclusion</h3><p>I came to Tokyo with the expectation of a highly modern city fused with Eastern culture. But it is difficult to see many ways it is ahead of NYC technically and it is very similar to NYC culturally. In some ways Tokyo even seems a little stuck in the past or just... off. Why are all vending machines [e.g. for tickets, ordering food, etc.] mechanical and not touch screens? The National Museum of Science is awfully old and ugly, the National Diet Building the same.</p> <p>So on the one hand I’d like to let the next person down lightly on the excitement of Japan. It is a world-class city with great restaurants, live music and refined culture but all-in-all very similar to NYC. On the other hand I recommend Seoul for a cheaper, cleaner, more English-speaker friendly, and genuinely novel city with splashes of "Eastern" romantic elements like Tokyo.</p> <p><img src="http://www.englishspectrum.com/wp-content/uploads/2015/03/yeoido.JPG-1.jpg" alt="Cherry blossoms in Seoul" /> <small>Cherry blossoms in Seoul, source: English Spectrum</small></p> <h3 id="further-reading">Further reading</h3><p><a href="https://amzn.to/2PNOsih">MITI and the Japanese Miracle: The Growth of Industrial Policy, 1925-1975</a> is an excellent, albeit somewhat disputed introduction to the modern Japanese economy.</p> <p><a href="https://amzn.to/2EIw6hc">Asia’s Next Giant: South Korea and Late Industrialization</a> is a similar high-quality introduction to the South Korean economy.</p> <p>If you’re only familiar with US/Canadian companies or other “pure” market economies these two books are a great read on different, challenging styles of government policy, corporate structure, and life.</p> <p class="note"> P.s. I’m looking for book recommendations on the last 20 years of economic/political history in Japan and South Korean and on the last 100 years of economic/political history in the US and NYC. </p><p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Wrote a few points of comparison between <a href="https://twitter.com/hashtag/nyc?src=hash&amp;ref_src=twsrc%5Etfw">#nyc</a>, <a href="https://twitter.com/hashtag/seoul?src=hash&amp;ref_src=twsrc%5Etfw">#seoul</a>, and <a href="https://twitter.com/hashtag/tokyo?src=hash&amp;ref_src=twsrc%5Etfw">#tokyo</a> after finishing a recent trip. <a href="https://t.co/oKo4YlTZV3">https://t.co/oKo4YlTZV3</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/1053645222402416641?ref_src=twsrc%5Etfw">October 20, 2018</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/on-nyc-tokyo-and-seoul.htmlSat, 20 Oct 2018 00:00:00 +0000Why (and how) to read bookshttp://notes.eatonphil.com/why-and-how-to-read-books.html<p>The last time I read for fun was in elementary school. Since college I knew I must read more, but I never forced myself to build the habit. Then three years ago I spent time around my brother and a coworker who were avid readers. This "peer pressure" helped me get started.</p> <p>Since I started, I've seen concrete improvements in vocabulary. I find myself using words I didn't know I knew. I question my choice of words more. And I understand coworkers a little better. Perhaps it is only personal style, but I've also become more aware of hyperbole in my speech and have begun to tone that down.</p> <p>Specifically, books provide more density of information than I can pull together myself. I've also benefited heavily from reading books on tools I use daily. Contrary to being boring, a book on a topic with which I'm familiar has been a (often needed) break from books on topics with which I am unfamiliar. The former category might include books on CSS, Bash, Emacs, Python, Scheme, data modeling, Linux/FreeBSD system administration, mystery novels, and so on. The latter category might include books on Common Lisp, system architecture, the implementation of Linux/FreeBSD, behavioral psychology, management, stock/bond markets, the history of Argentina/Chile/South Korea/Japan, sci-fi novels, and so on.</p> <p>Reading diversely exposes how little I know. And that can be depressing. But I'm fairly confident reading books is the fastest way to grow.</p> <p>Tactically speaking, I started slowly with few books and the ones easiest for me to read. The first year I read two books, both technical. The second year I read nine books and was able to start branching out beyond technical books. Last year I read a much more diverse set of forty books. And this year I followed suit with forty-one so far (on track for fifty-five or so).</p> <p>I keep track of books I'm reading and books I want to read in <a href="https://www.goodreads.com/eatonphil">Goodreads</a>. I particularly enjoy their reading challenge system that lets you know if you are on track to meet your reading goal for the year.</p> http://notes.eatonphil.com/why-and-how-to-read-books.htmlWed, 26 Sep 2018 00:00:00 +0000Compiling dynamic programming languageshttp://notes.eatonphil.com/compiling-dynamic-programming-languages.html<p>It can be difficult to disassociate the idea that dynamically typed programming languages are tied to byte-code interpreters (e.g. YARV Ruby, CPython, V8, Zend Engine, etc.). But for many languages, a compiled implementation also exists. Cython, Chicken Scheme and SBCL are good examples.</p> <p>In this post I will briefly describe how I built a compiler for my <a href="https://github.com/eatonphil/bsdscheme">Scheme implementation</a> using artifacts from the interpreter. In doing this, I learned a simple (not novel) technique for compiling dynamic languages. I'll introduce the <a href="https://github.com/eatonphil/jsc">Javascript to C++/V8 compiler</a> I am developing using this technique.</p> <h3 id="bsdscheme">BSDScheme</h3><p>For the past year I've developed a Scheme implementation, <a href="https://github.com/eatonphil/bsdscheme">BSDScheme</a>. I started with an AST-interpreter (as opposed to a byte-code compiler and VM). A more detailed blog post on the first few steps writing BSDScheme can be found <a href="http://notes.eatonphil.com/first-few-hurdles-writing-a-scheme-interpreter.html">here</a>.</p> <p>As I built up support for the various objects and operations in the language, I had a sizeable base of D code for the BSDScheme runtime. This included an object representation for primitive types (and support for converting to and from types in D) as well as basic Scheme operations (<code>+</code>, <code>-</code>, <code>car</code>, <code>cdr</code>, etc.).</p> <p>When the time came to implement a compiler backend, I only needed to do codegen since the parser already existed. Furthermore, the fundamental bits had already been written: object representation and much of the standard library. So I wrote the simplest compiler I could think of by targeting D and the objects/functions I had already written to support the interpreter.</p> <p>Take, for example, the <code>equals</code> <a href="https://github.com/eatonphil/bsdscheme/blob/master/src/common.d#L140">function</a> in the standard library:</p> <div class="highlight"><pre><span></span><span class="n">Value</span><span class="w"> </span><span class="nf">equals</span><span class="p">(</span><span class="n">Value</span><span class="w"> </span><span class="n">arguments</span><span class="p">,</span><span class="w"> </span><span class="kt">void</span><span class="o">**</span><span class="w"> </span><span class="n">rest</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">tuple</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">valueToList</span><span class="p">(</span><span class="n">arguments</span><span class="p">);</span> <span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">left</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tuple</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span> <span class="w"> </span><span class="k">auto</span><span class="w"> </span><span class="n">right</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">car</span><span class="p">(</span><span class="n">tuple</span><span class="p">[</span><span class="mi">1</span><span class="p">]);</span> <span class="w"> </span><span class="kt">bool</span><span class="w"> </span><span class="n">b</span><span class="p">;</span> <span class="w"> </span><span class="k">switch</span><span class="w"> </span><span class="p">(</span><span class="n">tagOfValue</span><span class="p">(</span><span class="n">left</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">ValueTag</span><span class="p">.</span><span class="no">Integer</span><span class="p">:</span> <span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">valueIsInteger</span><span class="p">(</span><span class="n">right</span><span class="p">)</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="n">valueToInteger</span><span class="p">(</span><span class="n">left</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">valueToInteger</span><span class="p">(</span><span class="n">right</span><span class="p">);</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">ValueTag</span><span class="p">.</span><span class="no">Char</span><span class="p">:</span> <span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">valueIsChar</span><span class="p">(</span><span class="n">right</span><span class="p">)</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="n">valueToChar</span><span class="p">(</span><span class="n">left</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">valueToChar</span><span class="p">(</span><span class="n">right</span><span class="p">);</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">ValueTag</span><span class="p">.</span><span class="no">String</span><span class="p">:</span> <span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">valueIsString</span><span class="p">(</span><span class="n">right</span><span class="p">)</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="n">valueToString</span><span class="p">(</span><span class="n">left</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">valueToString</span><span class="p">(</span><span class="n">right</span><span class="p">);</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">ValueTag</span><span class="p">.</span><span class="no">Symbol</span><span class="p">:</span> <span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">valueIsSymbol</span><span class="p">(</span><span class="n">right</span><span class="p">)</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="n">valueToSymbol</span><span class="p">(</span><span class="n">left</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">valueToSymbol</span><span class="p">(</span><span class="n">right</span><span class="p">);</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">ValueTag</span><span class="p">.</span><span class="no">Function</span><span class="p">:</span> <span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">valueIsFunction</span><span class="p">(</span><span class="n">right</span><span class="p">)</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="n">valueToFunction</span><span class="p">(</span><span class="n">left</span><span class="p">)[</span><span class="mi">1</span><span class="p">]</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">valueToFunction</span><span class="p">(</span><span class="n">right</span><span class="p">)[</span><span class="mi">1</span><span class="p">];</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="k">case</span><span class="w"> </span><span class="no">ValueTag</span><span class="p">.</span><span class="no">Bool</span><span class="p">:</span> <span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">valueIsBool</span><span class="p">(</span><span class="n">right</span><span class="p">)</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="n">valueToBool</span><span class="p">(</span><span class="n">left</span><span class="p">)</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="n">valueToBool</span><span class="p">(</span><span class="n">right</span><span class="p">);</span> <span class="w"> </span><span class="k">break</span><span class="p">;</span> <span class="w"> </span><span class="k">default</span><span class="o">:</span> <span class="w"> </span><span class="n">b</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nb">false</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">makeBoolValue</span><span class="p">(</span><span class="n">b</span><span class="p">);</span> <span class="p">}</span> </pre></div> <p>So long as my compiler generated code that used the <code>Value</code> object to represent Scheme data, I already had an <code>equals</code> function and large swaths of a Scheme standard library that I could share between the compiler and interpreter.</p> <p>Ultimately I only needed to implement a few control structures to support compiling a large subset of what I supported in the interpreter. The key aspects here include: function definitions (in D), function calls (D function calls), if/else (if/else in D) and so on.</p> <p>To give a concrete example of a whole program compiled, this Scheme program:</p> <div class="highlight"><pre><span></span><span class="p">(</span><span class="k">define</span><span class="w"> </span><span class="p">(</span><span class="nb">exp</span><span class="w"> </span><span class="nv">base</span><span class="w"> </span><span class="nv">pow</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nb">=</span><span class="w"> </span><span class="nv">pow</span><span class="w"> </span><span class="mi">0</span><span class="p">)</span> <span class="w"> </span><span class="mi">1</span> <span class="w"> </span><span class="p">(</span><span class="nb">*</span><span class="w"> </span><span class="nv">base</span><span class="w"> </span><span class="p">(</span><span class="nb">exp</span><span class="w"> </span><span class="nv">base</span><span class="w"> </span><span class="p">(</span><span class="nb">-</span><span class="w"> </span><span class="nv">pow</span><span class="w"> </span><span class="mi">1</span><span class="p">)))))</span> <span class="p">(</span><span class="k">define</span><span class="w"> </span><span class="p">(</span><span class="nf">main</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="nb">display</span><span class="w"> </span><span class="p">(</span><span class="nb">exp</span><span class="w"> </span><span class="mi">2</span><span class="w"> </span><span class="mi">16</span><span class="p">))</span> <span class="p">(</span><span class="nb">newline</span><span class="p">))</span> </pre></div> <p>when run through the BSDScheme compiler would become:</p> <div class="highlight"><pre><span></span><span class="k">import</span><span class="w"> </span><span class="n">std</span><span class="p">.</span><span class="n">stdio</span><span class="p">;</span> <span class="k">import</span><span class="w"> </span><span class="n">lex</span><span class="p">;</span> <span class="k">import</span><span class="w"> </span><span class="n">common</span><span class="p">;</span> <span class="k">import</span><span class="w"> </span><span class="n">parse</span><span class="p">;</span> <span class="k">import</span><span class="w"> </span><span class="n">utility</span><span class="p">;</span> <span class="k">import</span><span class="w"> </span><span class="n">value</span><span class="p">;</span> <span class="k">import</span><span class="w"> </span><span class="n">buffer</span><span class="p">;</span> <span class="n">Value</span><span class="w"> </span><span class="nf">exp</span><span class="p">(</span><span class="n">Value</span><span class="w"> </span><span class="n">arguments</span><span class="p">,</span><span class="w"> </span><span class="kt">void</span><span class="o">**</span><span class="w"> </span><span class="n">ctx</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">Value</span><span class="p">[]</span><span class="w"> </span><span class="n">tmp</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">listToVector</span><span class="p">(</span><span class="n">arguments</span><span class="p">);</span> <span class="w"> </span><span class="n">Value</span><span class="w"> </span><span class="n">base</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tmp</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span> <span class="w"> </span><span class="n">Value</span><span class="w"> </span><span class="n">pow</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">tmp</span><span class="p">[</span><span class="mi">1</span><span class="p">];</span> <span class="w"> </span><span class="n">Value</span><span class="w"> </span><span class="n">equals_result</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">equals</span><span class="p">(</span><span class="n">vectorToList</span><span class="p">([</span><span class="n">pow</span><span class="p">,</span><span class="w"> </span><span class="n">makeIntegerValue</span><span class="p">(</span><span class="mi">0</span><span class="p">)]),</span><span class="w"> </span><span class="n">null</span><span class="p">);</span> <span class="w"> </span><span class="n">Value</span><span class="w"> </span><span class="n">if_result</span><span class="p">;</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">truthy</span><span class="p">(</span><span class="n">equals_result</span><span class="p">))</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">makeIntegerValue</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span> <span class="w"> </span><span class="n">if_result</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">makeIntegerValue</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span> <span class="w"> </span><span class="p">}</span><span class="w"> </span><span class="k">else</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">Value</span><span class="w"> </span><span class="n">minus_result</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">minus</span><span class="p">(</span><span class="n">vectorToList</span><span class="p">([</span><span class="n">pow</span><span class="p">,</span><span class="w"> </span><span class="n">makeIntegerValue</span><span class="p">(</span><span class="mi">1</span><span class="p">)]),</span><span class="w"> </span><span class="n">null</span><span class="p">);</span> <span class="w"> </span><span class="n">Value</span><span class="w"> </span><span class="n">exp_result</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">exp</span><span class="p">(</span><span class="n">vectorToList</span><span class="p">([</span><span class="n">base</span><span class="p">,</span><span class="w"> </span><span class="n">minus_result</span><span class="p">]),</span><span class="w"> </span><span class="n">null</span><span class="p">);</span> <span class="w"> </span><span class="n">Value</span><span class="w"> </span><span class="n">times_result</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">times</span><span class="p">(</span><span class="n">vectorToList</span><span class="p">([</span><span class="n">base</span><span class="p">,</span><span class="w"> </span><span class="n">exp_result</span><span class="p">]),</span><span class="w"> </span><span class="n">null</span><span class="p">);</span> <span class="w"> </span><span class="n">if_result</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">times_result</span><span class="p">;</span> <span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">if_result</span><span class="p">;</span> <span class="p">}</span> <span class="n">Value</span><span class="w"> </span><span class="nf">BSDScheme_main</span><span class="p">(</span><span class="n">Value</span><span class="w"> </span><span class="n">arguments</span><span class="p">,</span><span class="w"> </span><span class="kt">void</span><span class="o">**</span><span class="w"> </span><span class="n">ctx</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">Value</span><span class="w"> </span><span class="n">exp_result</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">exp</span><span class="p">(</span><span class="n">vectorToList</span><span class="p">([</span><span class="n">makeIntegerValue</span><span class="p">(</span><span class="mi">2</span><span class="p">),</span><span class="w"> </span><span class="n">makeIntegerValue</span><span class="p">(</span><span class="mi">16</span><span class="p">)]),</span><span class="w"> </span><span class="n">null</span><span class="p">);</span> <span class="w"> </span><span class="n">Value</span><span class="w"> </span><span class="n">display_result</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">display</span><span class="p">(</span><span class="n">vectorToList</span><span class="p">([</span><span class="n">exp_result</span><span class="p">]),</span><span class="w"> </span><span class="n">null</span><span class="p">);</span> <span class="w"> </span><span class="n">Value</span><span class="w"> </span><span class="n">newline_result</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">newline</span><span class="p">(</span><span class="n">vectorToList</span><span class="p">([]),</span><span class="w"> </span><span class="n">null</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="n">newline_result</span><span class="p">;</span> <span class="p">}</span> <span class="kt">void</span><span class="w"> </span><span class="nf">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">BSDScheme_main</span><span class="p">(</span><span class="n">nilValue</span><span class="p">,</span><span class="w"> </span><span class="n">cast</span><span class="p">(</span><span class="kt">void</span><span class="o">**</span><span class="p">)</span><span class="mi">0</span><span class="p">);</span><span class="w"> </span><span class="p">}</span> </pre></div> <p>Where <em>every imported function had already been written for the interpreter</em>. I had only to translate a few lines to D and import/call these existing libraries. Now I had a small <em>binary</em> of compiled Scheme.</p> <p>It was at this point I realized I was using the same technique used by Cython to compile Python code.</p> <p class="note"> ...the Cython project has approached this problem by means of a source code compiler that translates Python code to equivalent C code. This code is executed within the CPython runtime environment, but at the speed of compiled C and with the ability to call directly into C libraries. <a href="http://docs.cython.org/en/latest/src/quickstart/overview.html"> http://docs.cython.org/en/latest/src/quickstart/overview.html </a> </p><h3 id="jsc">jsc</h3><p>I played with many PL-research-y languages over the years and wanted to do build something a little more practical. So I took what I learned writing the BSDScheme compiler and decided to write a Javascript compiler. Specifically, it would target the easiest backend I could imagine: C++ using the V8 C++ library and generating a Node addon.</p> <p>There already existed well-trodden guides/means of writing Node addons in C++ so I spent some time trying to hand-compile simple Javascript programs to C++ and V8. A string in Javascript would become a <code>v8::String</code> type in C++. A number in Javascript would become <code>v8::Number</code> in C++ and so forth.</p> <p>I decided to write this compiler in Rust given its roots in (and my familiarity with) ML and Python. I found a <a href="https://github.com/dherman/esprit">Javascript parser by Dave Herman</a> and after a few lazy weeks finally got a "Hello world!" program compiling. Getting my first program to compile has by far been the hardest part of building jsc.</p> <p>Let's look at a concrete example of a recursive fibonacci program (example/recursion.js in the <a href="https://github.com/eatonphil/jsc">repo</a>):</p> <div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">fib</span><span class="p">(</span><span class="nx">i</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">i</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="mf">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">i</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fib</span><span class="p">(</span><span class="nx">i</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">fib</span><span class="p">(</span><span class="nx">i</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">2</span><span class="p">);</span> <span class="p">}</span> <span class="kd">function</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">fib</span><span class="p">(</span><span class="mf">20</span><span class="p">));</span> <span class="p">}</span> </pre></div> <p>Let's add a call to <code>main()</code> at the end and time this with Node to get a baseline:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">time</span><span class="w"> </span>node<span class="w"> </span>example/recursion.js <span class="m">6765</span> node<span class="w"> </span>example/recursion.js<span class="w"> </span><span class="m">0</span>.06s<span class="w"> </span>user<span class="w"> </span><span class="m">0</span>.02s<span class="w"> </span>system<span class="w"> </span><span class="m">97</span>%<span class="w"> </span>cpu<span class="w"> </span><span class="m">0</span>.083<span class="w"> </span>total </pre></div> <p>Now let's install jsc to compare. We'll need Rust, Cargo, Node and Node-GYP.</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>git<span class="w"> </span>clone<span class="w"> </span>https:/github.com/eatonphil/jsc $<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>jsc $<span class="w"> </span>make<span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span>make<span class="w"> </span>install $<span class="w"> </span>jsc<span class="w"> </span>example/recursion.js </pre></div> <p>jsc produces a Javascript entrypoint that imports our addon (build/recursion.js):</p> <div class="highlight"><pre><span></span><span class="nx">require</span><span class="p">(</span><span class="s2">&quot;./build/Release/recursion&quot;</span><span class="p">).</span><span class="nx">jsc_main</span><span class="p">();</span> </pre></div> <p>And it produces a C++ file that represents the entire program (build/recursion.cc):</p> <div class="highlight"><pre><span></span><span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;string&gt;</span> <span class="cp">#include</span><span class="w"> </span><span class="cpf">&lt;node.h&gt;</span> <span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">Boolean</span><span class="p">;</span> <span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">Context</span><span class="p">;</span> <span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">Exception</span><span class="p">;</span> <span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">Function</span><span class="p">;</span> <span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">FunctionTemplate</span><span class="p">;</span> <span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">FunctionCallbackInfo</span><span class="p">;</span> <span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">Isolate</span><span class="p">;</span> <span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">Local</span><span class="p">;</span> <span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">Null</span><span class="p">;</span> <span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">Number</span><span class="p">;</span> <span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">Object</span><span class="p">;</span> <span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">String</span><span class="p">;</span> <span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">False</span><span class="p">;</span> <span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">True</span><span class="p">;</span> <span class="k">using</span><span class="w"> </span><span class="n">v8</span><span class="o">::</span><span class="n">Value</span><span class="p">;</span> <span class="kt">void</span><span class="w"> </span><span class="nf">fib</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="n">FunctionCallbackInfo</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;&amp;</span><span class="w"> </span><span class="n">args</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">Isolate</span><span class="o">*</span><span class="w"> </span><span class="n">isolate</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">args</span><span class="p">.</span><span class="n">GetIsolate</span><span class="p">();</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;</span><span class="w"> </span><span class="n">i</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">args</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span> <span class="nl">tail_recurse_1</span><span class="p">:</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Context</span><span class="o">&gt;</span><span class="w"> </span><span class="n">ctx_2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">isolate</span><span class="o">-&gt;</span><span class="n">GetCurrentContext</span><span class="p">();</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Object</span><span class="o">&gt;</span><span class="w"> </span><span class="n">global_3</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ctx_2</span><span class="o">-&gt;</span><span class="n">Global</span><span class="p">();</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Function</span><span class="o">&gt;</span><span class="w"> </span><span class="n">Boolean_4</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Function</span><span class="o">&gt;::</span><span class="n">Cast</span><span class="p">(</span><span class="n">global_3</span><span class="o">-&gt;</span><span class="n">Get</span><span class="p">(</span><span class="n">String</span><span class="o">::</span><span class="n">NewFromUtf8</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;Boolean&quot;</span><span class="p">)));</span> <span class="w"> </span><span class="n">String</span><span class="o">::</span><span class="n">Utf8Value</span><span class="w"> </span><span class="n">utf8value_tmp_5</span><span class="p">(</span><span class="n">i</span><span class="p">);</span> <span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">string_tmp_6</span><span class="p">(</span><span class="o">*</span><span class="n">utf8value_tmp_5</span><span class="p">);</span> <span class="w"> </span><span class="n">String</span><span class="o">::</span><span class="n">Utf8Value</span><span class="w"> </span><span class="n">utf8value_tmp_7</span><span class="p">(</span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">));</span> <span class="w"> </span><span class="n">std</span><span class="o">::</span><span class="n">string</span><span class="w"> </span><span class="n">string_tmp_8</span><span class="p">(</span><span class="o">*</span><span class="n">utf8value_tmp_7</span><span class="p">);</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;</span><span class="w"> </span><span class="n">argv_9</span><span class="p">[]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="o">-&gt;</span><span class="n">IsNumber</span><span class="p">()</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">IsNumber</span><span class="p">())</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="n">Boolean</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="o">-&gt;</span><span class="n">ToNumber</span><span class="p">(</span><span class="n">isolate</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">Value</span><span class="p">()</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">ToNumber</span><span class="p">(</span><span class="n">isolate</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">Value</span><span class="p">())</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">((</span><span class="n">i</span><span class="o">-&gt;</span><span class="n">IsString</span><span class="p">()</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">IsString</span><span class="p">())</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="n">Boolean</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="n">string_tmp_6</span><span class="w"> </span><span class="o">&lt;=</span><span class="w"> </span><span class="n">string_tmp_8</span><span class="p">)</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="p">(</span><span class="n">False</span><span class="p">(</span><span class="n">isolate</span><span class="p">)))</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;</span><span class="w"> </span><span class="n">result_10</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Boolean_4</span><span class="o">-&gt;</span><span class="n">Call</span><span class="p">(</span><span class="n">Null</span><span class="p">(</span><span class="n">isolate</span><span class="p">),</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="n">argv_9</span><span class="p">);</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="n">result_10</span><span class="o">-&gt;</span><span class="n">ToBoolean</span><span class="p">()</span><span class="o">-&gt;</span><span class="n">Value</span><span class="p">())</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">args</span><span class="p">.</span><span class="n">GetReturnValue</span><span class="p">().</span><span class="n">Set</span><span class="p">(</span><span class="n">i</span><span class="p">);</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;</span><span class="w"> </span><span class="n">arg_11</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="o">-&gt;</span><span class="n">IsNumber</span><span class="p">()</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">IsNumber</span><span class="p">())</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="p">(</span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="o">-&gt;</span><span class="n">ToNumber</span><span class="p">(</span><span class="n">isolate</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">Value</span><span class="p">()</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">1</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">ToNumber</span><span class="p">(</span><span class="n">isolate</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">Value</span><span class="p">()))</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Number</span><span class="o">&gt;::</span><span class="n">Cast</span><span class="p">(</span><span class="n">Null</span><span class="p">(</span><span class="n">isolate</span><span class="p">));</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">FunctionTemplate</span><span class="o">&gt;</span><span class="w"> </span><span class="n">ftpl_13</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">FunctionTemplate</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="n">fib</span><span class="p">);</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Function</span><span class="o">&gt;</span><span class="w"> </span><span class="n">fn_12</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ftpl_13</span><span class="o">-&gt;</span><span class="n">GetFunction</span><span class="p">();</span> <span class="w"> </span><span class="n">fn_12</span><span class="o">-&gt;</span><span class="n">SetName</span><span class="p">(</span><span class="n">String</span><span class="o">::</span><span class="n">NewFromUtf8</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;fib&quot;</span><span class="p">));</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;</span><span class="w"> </span><span class="n">argv_14</span><span class="p">[]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">arg_11</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;</span><span class="w"> </span><span class="n">result_15</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fn_12</span><span class="o">-&gt;</span><span class="n">Call</span><span class="p">(</span><span class="n">Null</span><span class="p">(</span><span class="n">isolate</span><span class="p">),</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="n">argv_14</span><span class="p">);</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;</span><span class="w"> </span><span class="n">arg_16</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">(</span><span class="n">i</span><span class="o">-&gt;</span><span class="n">IsNumber</span><span class="p">()</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">IsNumber</span><span class="p">())</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="p">(</span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="n">i</span><span class="o">-&gt;</span><span class="n">ToNumber</span><span class="p">(</span><span class="n">isolate</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">Value</span><span class="p">()</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">2</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">ToNumber</span><span class="p">(</span><span class="n">isolate</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">Value</span><span class="p">()))</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Number</span><span class="o">&gt;::</span><span class="n">Cast</span><span class="p">(</span><span class="n">Null</span><span class="p">(</span><span class="n">isolate</span><span class="p">));</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">FunctionTemplate</span><span class="o">&gt;</span><span class="w"> </span><span class="n">ftpl_18</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">FunctionTemplate</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="n">fib</span><span class="p">);</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Function</span><span class="o">&gt;</span><span class="w"> </span><span class="n">fn_17</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ftpl_18</span><span class="o">-&gt;</span><span class="n">GetFunction</span><span class="p">();</span> <span class="w"> </span><span class="n">fn_17</span><span class="o">-&gt;</span><span class="n">SetName</span><span class="p">(</span><span class="n">String</span><span class="o">::</span><span class="n">NewFromUtf8</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;fib&quot;</span><span class="p">));</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;</span><span class="w"> </span><span class="n">argv_19</span><span class="p">[]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">arg_16</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;</span><span class="w"> </span><span class="n">result_20</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fn_17</span><span class="o">-&gt;</span><span class="n">Call</span><span class="p">(</span><span class="n">Null</span><span class="p">(</span><span class="n">isolate</span><span class="p">),</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="n">argv_19</span><span class="p">);</span> <span class="w"> </span><span class="n">args</span><span class="p">.</span><span class="n">GetReturnValue</span><span class="p">().</span><span class="n">Set</span><span class="p">((</span><span class="n">result_15</span><span class="o">-&gt;</span><span class="n">IsString</span><span class="p">()</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">result_20</span><span class="o">-&gt;</span><span class="n">IsString</span><span class="p">())</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;::</span><span class="n">Cast</span><span class="p">(</span><span class="n">String</span><span class="o">::</span><span class="n">Concat</span><span class="p">(</span><span class="n">result_15</span><span class="o">-&gt;</span><span class="n">ToString</span><span class="p">(),</span><span class="w"> </span><span class="n">result_20</span><span class="o">-&gt;</span><span class="n">ToString</span><span class="p">()))</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;::</span><span class="n">Cast</span><span class="p">((</span><span class="n">result_15</span><span class="o">-&gt;</span><span class="n">IsNumber</span><span class="p">()</span><span class="w"> </span><span class="o">||</span><span class="w"> </span><span class="n">result_20</span><span class="o">-&gt;</span><span class="n">IsNumber</span><span class="p">())</span><span class="w"> </span><span class="o">?</span><span class="w"> </span><span class="p">(</span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="n">result_15</span><span class="o">-&gt;</span><span class="n">ToNumber</span><span class="p">(</span><span class="n">isolate</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">Value</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">result_20</span><span class="o">-&gt;</span><span class="n">ToNumber</span><span class="p">(</span><span class="n">isolate</span><span class="p">)</span><span class="o">-&gt;</span><span class="n">Value</span><span class="p">()))</span><span class="w"> </span><span class="o">:</span><span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Number</span><span class="o">&gt;::</span><span class="n">Cast</span><span class="p">(</span><span class="n">Null</span><span class="p">(</span><span class="n">isolate</span><span class="p">))));</span> <span class="w"> </span><span class="k">return</span><span class="p">;</span> <span class="p">}</span> <span class="kt">void</span><span class="w"> </span><span class="nf">jsc_main</span><span class="p">(</span><span class="k">const</span><span class="w"> </span><span class="n">FunctionCallbackInfo</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;&amp;</span><span class="w"> </span><span class="n">args</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">Isolate</span><span class="o">*</span><span class="w"> </span><span class="n">isolate</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">args</span><span class="p">.</span><span class="n">GetIsolate</span><span class="p">();</span> <span class="nl">tail_recurse_21</span><span class="p">:</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;</span><span class="w"> </span><span class="n">arg_22</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Number</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="mi">20</span><span class="p">);</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">FunctionTemplate</span><span class="o">&gt;</span><span class="w"> </span><span class="n">ftpl_24</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">FunctionTemplate</span><span class="o">::</span><span class="n">New</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="n">fib</span><span class="p">);</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Function</span><span class="o">&gt;</span><span class="w"> </span><span class="n">fn_23</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ftpl_24</span><span class="o">-&gt;</span><span class="n">GetFunction</span><span class="p">();</span> <span class="w"> </span><span class="n">fn_23</span><span class="o">-&gt;</span><span class="n">SetName</span><span class="p">(</span><span class="n">String</span><span class="o">::</span><span class="n">NewFromUtf8</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;fib&quot;</span><span class="p">));</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;</span><span class="w"> </span><span class="n">argv_25</span><span class="p">[]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">arg_22</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;</span><span class="w"> </span><span class="n">result_26</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fn_23</span><span class="o">-&gt;</span><span class="n">Call</span><span class="p">(</span><span class="n">Null</span><span class="p">(</span><span class="n">isolate</span><span class="p">),</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="n">argv_25</span><span class="p">);</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;</span><span class="w"> </span><span class="n">arg_27</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">result_26</span><span class="p">;</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Function</span><span class="o">&gt;</span><span class="w"> </span><span class="n">fn_28</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Function</span><span class="o">&gt;::</span><span class="n">Cast</span><span class="p">(</span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Object</span><span class="o">&gt;::</span><span class="n">Cast</span><span class="p">(</span><span class="n">isolate</span><span class="o">-&gt;</span><span class="n">GetCurrentContext</span><span class="p">()</span><span class="o">-&gt;</span><span class="n">Global</span><span class="p">()</span><span class="o">-&gt;</span><span class="n">Get</span><span class="p">(</span><span class="n">String</span><span class="o">::</span><span class="n">NewFromUtf8</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;console&quot;</span><span class="p">)))</span><span class="o">-&gt;</span><span class="n">Get</span><span class="p">(</span><span class="n">String</span><span class="o">::</span><span class="n">NewFromUtf8</span><span class="p">(</span><span class="n">isolate</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;log&quot;</span><span class="p">)));</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;</span><span class="w"> </span><span class="n">argv_29</span><span class="p">[]</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="p">{</span><span class="w"> </span><span class="n">arg_27</span><span class="w"> </span><span class="p">};</span> <span class="w"> </span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Value</span><span class="o">&gt;</span><span class="w"> </span><span class="n">result_30</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fn_28</span><span class="o">-&gt;</span><span class="n">Call</span><span class="p">(</span><span class="n">Null</span><span class="p">(</span><span class="n">isolate</span><span class="p">),</span><span class="w"> </span><span class="mi">1</span><span class="p">,</span><span class="w"> </span><span class="n">argv_29</span><span class="p">);</span> <span class="w"> </span><span class="n">result_30</span><span class="p">;</span> <span class="p">}</span> <span class="kt">void</span><span class="w"> </span><span class="nf">Init</span><span class="p">(</span><span class="n">Local</span><span class="o">&lt;</span><span class="n">Object</span><span class="o">&gt;</span><span class="w"> </span><span class="n">exports</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="n">NODE_SET_METHOD</span><span class="p">(</span><span class="n">exports</span><span class="p">,</span><span class="w"> </span><span class="s">&quot;jsc_main&quot;</span><span class="p">,</span><span class="w"> </span><span class="n">jsc_main</span><span class="p">);</span> <span class="p">}</span> <span class="n">NODE_MODULE</span><span class="p">(</span><span class="n">NODE_GYP_MODULE_NAME</span><span class="p">,</span><span class="w"> </span><span class="n">Init</span><span class="p">)</span> </pre></div> <p>Let's time this version:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">time</span><span class="w"> </span>node<span class="w"> </span>build/recursion.js <span class="m">6765</span> node<span class="w"> </span>build/recursion.js<span class="w"> </span><span class="m">0</span>.16s<span class="w"> </span>user<span class="w"> </span><span class="m">0</span>.03s<span class="w"> </span>system<span class="w"> </span><span class="m">107</span>%<span class="w"> </span>cpu<span class="w"> </span><span class="m">0</span>.175<span class="w"> </span>total </pre></div> <p>jsc, over twice as slow, is already falling behind Node. :)</p> <p>As I incremented the number passed to my fibonacci function the compiled program time to completion get exponentially worse. Node stayed the same. I decided to try tail-call optimization to decrease the performance distance between Node and jsc.</p> <p>I implemented tail-call optimization for the interpreter in BSDScheme by putting all functions in a loop that would break if tail-call elimination was not to happen. It took me a week to implement this and I never put it in place for the compiler. This time around I was able to add basic tail call elimination to jsc in two hours. It is done by <code>label</code>s and <code>goto</code>s instead of a tail call when applicable.</p> <p>Here is a tail-call optimized version of the same program (example/tco.js):</p> <div class="highlight"><pre><span></span><span class="kd">function</span><span class="w"> </span><span class="nx">fib</span><span class="p">(</span><span class="nx">n</span><span class="p">,</span><span class="w"> </span><span class="nx">a</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">n</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mf">0</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">a</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">if</span><span class="w"> </span><span class="p">(</span><span class="nx">n</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="mf">1</span><span class="p">)</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">b</span><span class="p">;</span> <span class="w"> </span><span class="p">}</span> <span class="w"> </span><span class="k">return</span><span class="w"> </span><span class="nx">fib</span><span class="p">(</span><span class="nx">n</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="mf">1</span><span class="p">,</span><span class="w"> </span><span class="nx">b</span><span class="p">,</span><span class="w"> </span><span class="nx">a</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="nx">b</span><span class="p">);</span> <span class="p">}</span> <span class="kd">function</span><span class="w"> </span><span class="nx">main</span><span class="p">()</span><span class="w"> </span><span class="p">{</span> <span class="w"> </span><span class="nx">console</span><span class="p">.</span><span class="nx">log</span><span class="p">(</span><span class="nx">fib</span><span class="p">(</span><span class="mf">50</span><span class="p">,</span><span class="w"> </span><span class="mf">0</span><span class="p">,</span><span class="w"> </span><span class="mf">1</span><span class="p">));</span> <span class="p">}</span> </pre></div> <p>We add a call to <code>main()</code> again for Node and time it:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">time</span><span class="w"> </span>node<span class="w"> </span>example/tco.js <span class="m">12586269025</span> node<span class="w"> </span>example/tco.js<span class="w"> </span><span class="m">0</span>.06s<span class="w"> </span>user<span class="w"> </span><span class="m">0</span>.02s<span class="w"> </span>system<span class="w"> </span><span class="m">96</span>%<span class="w"> </span>cpu<span class="w"> </span><span class="m">0</span>.080<span class="w"> </span>total </pre></div> <p>And compile it with jsc and time it:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>jsc<span class="w"> </span>example/tco.js $<span class="w"> </span><span class="nb">time</span><span class="w"> </span>node<span class="w"> </span>build/tco.js <span class="m">12586269025</span> node<span class="w"> </span>build/tco.js<span class="w"> </span><span class="m">0</span>.07s<span class="w"> </span>user<span class="w"> </span><span class="m">0</span>.02s<span class="w"> </span>system<span class="w"> </span><span class="m">95</span>%<span class="w"> </span>cpu<span class="w"> </span><span class="m">0</span>.087<span class="w"> </span>total </pre></div> <p>Well that's not bad at all. :)</p> <h3 id="next-steps-with-jsc">Next steps with jsc</h3><p>jsc has very limited support for... everything. Today I added almost all primitive numeric operations + equality/inequality operations + unit tests. jsc does not yet support nested functions, callbacks, or closures. It supports <code>while</code> loops but not yet <code>for</code> loops. And I'm not sure if it supports <code>else if</code>. It does not support arrays or objects let alone constructors and prototypes. Adding support for these is low-hanging fruit.</p> <p>After the low-hanging fruit, more interesting projects for jsc include:</p> <ul> <li>generating C++ with embedded V8 rather than only targeting Node addons</li> <li>type inference or type hinting for generating unboxed functions a la Cython and SBCL</li> </ul> http://notes.eatonphil.com/compiling-dynamic-programming-languages.htmlSun, 02 Sep 2018 00:00:00 +0000btest: a language agnostic test runnerhttp://notes.eatonphil.com/btest-a-language-agnostic-test-runner.html<p><a href="https://github.com/briansteffens/btest">btest</a> is a minimal, language-agnostic test runner originally written for testing compilers. Brian, an ex- co-worker from Linode, wrote the first implementation in <a href="https://crystal-lang.org/">Crystal</a> (a compiled language clone of Ruby) for testing <a href="https://github.com/briansteffens/bshift">bshift</a>, a compiler project. The tool accomplished exactly what I needed for my own language project, <a href="https://github.com/eatonphil/bsdscheme">BSDScheme</a>, and had very few dependencies. After some issues with Crystal support in containerized CI environments, and despite some incredible <a href="https://github.com/briansteffens/btest/pull/5">assistance from</a> <a href="https://github.com/briansteffens/btest/pull/4">the Crystal community</a>, we rewrote btest in D to simplify downstream use.</p> <h3 id="how-it-works">How it works</h3><p>btest registers a command (or commands) to run and verifies the command output and status for different inputs. btest iterates over files in a directory to discover test groups and individual tests within. It supports a limited template language for easily adjusting a more-or-less similar set of tests. And it supports running test groups and individual tests themselves in parallel. All of this is managed via a simple YAML config.</p> <h3 id="btest.yaml">btest.yaml</h3><p>btest requires a project-level configuration file to declare the test directory, the command(s) to run per test, etc. Let's say we want to run tests against a python program. We create a <code>btest.yaml</code> file with the following:</p> <div class="highlight"><pre><span></span><span class="nt">test_path</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">tests</span> <span class="nt">runners</span><span class="p">:</span> <span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Run tests with cpython</span> <span class="w"> </span><span class="nt">run</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">python test.py</span> </pre></div> <p><code>test_path</code> is the directory in which tests are located. <code>runners</code> is an array of commands to run per test. We hard-code a file to run <code>test.py</code> as a project-level standard file that will get written to disk in an appropriate path for each test-case.</p> <h4 id="on-multiple-runners">On multiple runners</h4><p>Using multiple runners is helpful when we want to run all tests with different test commands or test command settings. For instance, we could run tests against cpython and pypy by adding another runner to the runners section.</p> <div class="highlight"><pre><span></span><span class="nt">test_path</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">tests</span> <span class="nt">runners</span><span class="p">:</span> <span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Run tests with cpython</span> <span class="w"> </span><span class="nt">run</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">python test.py</span> <span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Run tests with pypy</span> <span class="w"> </span><span class="nt">run</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">pypy test.py</span> </pre></div> <h3 id="an-example-test-config">An example test config</h3><p>Let's create a <code>divide-by-zero.yaml</code> file in the <code>tests</code> directory and add the following:</p> <div class="highlight"><pre><span></span><span class="nt">cases</span><span class="p">:</span> <span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">name</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">Should exit on divide by zero</span> <span class="w"> </span><span class="nt">status</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">1</span> <span class="w"> </span><span class="nt">stdout</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">|</span> <span class="w"> </span><span class="no">Traceback (most recent call last):</span> <span class="w"> </span><span class="no">File &quot;test.py&quot;, line 1, in &lt;module&gt;</span> <span class="w"> </span><span class="no">4 / 0</span> <span class="w"> </span><span class="no">ZeroDivisionError: division by zero</span> <span class="w"> </span><span class="nt">denominator</span><span class="p">:</span><span class="w"> </span><span class="l l-Scalar l-Scalar-Plain">0</span> <span class="nt">templates</span><span class="p">:</span> <span class="w"> </span><span class="p p-Indicator">-</span><span class="w"> </span><span class="nt">test.py</span><span class="p">:</span><span class="w"> </span><span class="p p-Indicator">|</span> <span class="w"> </span><span class="no">4 / {{ denominator }}</span> </pre></div> <p>In this example, <code>name</code> will be printed out when the test is run. <code>status</code> is the expected integer returned by running the program. <code>stdout</code> is the entire expected output written by the program during execution. None of these three fields are required. If <code>status</code> or <case>stdout</case> are not provided, btest will skip checking them.</p> <p>Any additional key-value pairs are treated as template variable values and will be substituted if/where it is referenced in the templates section when the case is run. <code>denominator</code> is the only such variable we use in this example. When this first (and only) case is run, <code>test.py</code> will be written to disk containing <code>4 / 0</code>.</p> <h4 id="templates-section">templates section</h4><p>The <code>templates</code> section is a dictionary allowing us to specify files to be created with variable substitution. All files are created in the same directory per test case, so if we want to import code we can do so with relative paths.</p> <p><a href="https://github.com/eatonphil/bsdscheme/blob/master/tests/include.yaml">Here</a> is a simple example of a BSDScheme test that uses this feature.</p> <h3 id="running-btest">Running btest</h3><p>Run btest from the root directory (the directory above <code>tests</code>) and we'll see all the grouped test cases that btest registers and the result of each test:</p> <div class="highlight"><pre><span></span><span class="err">$</span><span class="w"> </span><span class="n">btest</span> <span class="n">tests</span><span class="o">/</span><span class="n">divide</span><span class="o">-</span><span class="k">by</span><span class="o">-</span><span class="n">zero</span><span class="p">.</span><span class="n">yaml</span> <span class="o">[</span><span class="n">PASS</span><span class="o">]</span><span class="w"> </span><span class="n">Should</span><span class="w"> </span><span class="k">exit</span><span class="w"> </span><span class="k">on</span><span class="w"> </span><span class="n">divide</span><span class="w"> </span><span class="k">by</span><span class="w"> </span><span class="n">zero</span> <span class="mi">1</span><span class="w"> </span><span class="k">of</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="n">tests</span><span class="w"> </span><span class="n">passed</span><span class="w"> </span><span class="k">for</span><span class="w"> </span><span class="nl">runner</span><span class="p">:</span><span class="w"> </span><span class="n">Run</span><span class="w"> </span><span class="n">tests</span><span class="w"> </span><span class="k">with</span><span class="w"> </span><span class="n">cpython</span> </pre></div> <h3 id="use-in-ci-environments">Use in CI environments</h3><p>In the future we may provide pre-built release binaries. But in the meantime, the CI step involves downloading git and ldc and building/installing btest before calling it.</p> <h4 id="circle-ci">Circle CI</h4><p>This is the <a href="https://github.com/eatonphil/bsdscheme/blob/master/.circleci/config.yml">config</a> file I use for testing BSDScheme:</p> <div class="highlight"><pre><span></span><span class="n">version</span><span class="o">:</span><span class="w"> </span><span class="mi">2</span> <span class="n">jobs</span><span class="o">:</span> <span class="w"> </span><span class="n">build</span><span class="o">:</span> <span class="w"> </span><span class="n">docker</span><span class="o">:</span> <span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">image</span><span class="o">:</span><span class="w"> </span><span class="n">dlanguage</span><span class="o">/</span><span class="n">ldc</span> <span class="w"> </span><span class="n">steps</span><span class="o">:</span> <span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">checkout</span> <span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">run</span><span class="o">:</span> <span class="w"> </span><span class="n">name</span><span class="o">:</span><span class="w"> </span><span class="n">Install</span><span class="w"> </span><span class="n">debian</span><span class="o">-</span><span class="n">packaged</span><span class="w"> </span><span class="n">dependencies</span> <span class="w"> </span><span class="n">command</span><span class="o">:</span><span class="w"> </span><span class="o">|</span> <span class="w"> </span><span class="n">apt</span><span class="w"> </span><span class="n">update</span> <span class="w"> </span><span class="n">apt</span><span class="w"> </span><span class="n">install</span><span class="w"> </span><span class="o">-</span><span class="n">y</span><span class="w"> </span><span class="n">git</span><span class="w"> </span><span class="n">build</span><span class="o">-</span><span class="n">essential</span> <span class="w"> </span><span class="n">ln</span><span class="w"> </span><span class="o">-</span><span class="n">s</span><span class="w"> </span><span class="n">$</span><span class="o">(</span><span class="n">which</span><span class="w"> </span><span class="n">ldc2</span><span class="o">)</span><span class="w"> </span><span class="sr">/usr/local/bin/</span><span class="n">ldc</span> <span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">run</span><span class="o">:</span> <span class="w"> </span><span class="n">name</span><span class="o">:</span><span class="w"> </span><span class="n">Install</span><span class="w"> </span><span class="n">btest</span> <span class="w"> </span><span class="n">command</span><span class="o">:</span><span class="w"> </span><span class="o">|</span> <span class="w"> </span><span class="n">git</span><span class="w"> </span><span class="n">clone</span><span class="w"> </span><span class="n">https</span><span class="o">://</span><span class="n">github</span><span class="o">.</span><span class="na">com</span><span class="sr">/briansteffens/</span><span class="n">btest</span> <span class="w"> </span><span class="n">cd</span><span class="w"> </span><span class="n">btest</span> <span class="w"> </span><span class="n">make</span> <span class="w"> </span><span class="n">make</span><span class="w"> </span><span class="n">install</span> <span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">run</span><span class="o">:</span> <span class="w"> </span><span class="n">name</span><span class="o">:</span><span class="w"> </span><span class="n">Install</span><span class="w"> </span><span class="n">bsdscheme</span> <span class="w"> </span><span class="n">command</span><span class="o">:</span><span class="w"> </span><span class="o">|</span> <span class="w"> </span><span class="n">make</span> <span class="w"> </span><span class="n">make</span><span class="w"> </span><span class="n">install</span> <span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">run</span><span class="o">:</span> <span class="w"> </span><span class="n">name</span><span class="o">:</span><span class="w"> </span><span class="n">Run</span><span class="w"> </span><span class="n">bsdscheme</span><span class="w"> </span><span class="n">tests</span> <span class="w"> </span><span class="n">command</span><span class="o">:</span><span class="w"> </span><span class="n">btest</span> </pre></div> <h4 id="travis-ci">Travis CI</h4><p>This is the <a href="https://github.com/briansteffens/bshift/blob/master/.travis.yml">config</a> Brian uses for testing BShift:</p> <div class="highlight"><pre><span></span><span class="n">sudo</span><span class="o">:</span><span class="w"> </span><span class="n">required</span> <span class="n">language</span><span class="o">:</span><span class="w"> </span><span class="n">d</span> <span class="n">d</span><span class="o">:</span> <span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">ldc</span> <span class="n">script</span><span class="o">:</span> <span class="w"> </span><span class="err">#</span><span class="w"> </span><span class="n">ldc</span><span class="w"> </span><span class="n">gets</span><span class="w"> </span><span class="n">installed</span><span class="w"> </span><span class="k">as</span><span class="w"> </span><span class="n">other</span><span class="w"> </span><span class="n">names</span><span class="w"> </span><span class="n">sometimes</span> <span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">sudo</span><span class="w"> </span><span class="n">ln</span><span class="w"> </span><span class="o">-</span><span class="n">s</span><span class="w"> </span><span class="err">`</span><span class="n">which</span><span class="w"> </span><span class="n">$DC</span><span class="err">`</span><span class="w"> </span><span class="sr">/usr/local/bin/</span><span class="n">ldc</span> <span class="w"> </span><span class="err">#</span><span class="w"> </span><span class="n">bshift</span> <span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">make</span> <span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">sudo</span><span class="w"> </span><span class="n">ln</span><span class="w"> </span><span class="o">-</span><span class="n">s</span><span class="w"> </span><span class="n">$PWD</span><span class="sr">/bin/bshift /usr/local/bin/</span><span class="n">bshift</span> <span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">sudo</span><span class="w"> </span><span class="n">ln</span><span class="w"> </span><span class="o">-</span><span class="n">s</span><span class="w"> </span><span class="n">$PWD</span><span class="sr">/lib /usr/local/lib/</span><span class="n">bshift</span> <span class="w"> </span><span class="err">#</span><span class="w"> </span><span class="n">nasm</span> <span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">sudo</span><span class="w"> </span><span class="n">apt</span><span class="o">-</span><span class="kd">get</span><span class="w"> </span><span class="n">install</span><span class="w"> </span><span class="o">-</span><span class="n">y</span><span class="w"> </span><span class="n">nasm</span> <span class="w"> </span><span class="err">#</span><span class="w"> </span><span class="n">basm</span> <span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">git</span><span class="w"> </span><span class="n">clone</span><span class="w"> </span><span class="n">https</span><span class="o">://</span><span class="n">github</span><span class="o">.</span><span class="na">com</span><span class="sr">/briansteffens/</span><span class="n">basm</span> <span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">cd</span><span class="w"> </span><span class="n">basm</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="n">cabal</span><span class="w"> </span><span class="n">build</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="n">cd</span><span class="w"> </span><span class="o">..</span> <span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">sudo</span><span class="w"> </span><span class="n">ln</span><span class="w"> </span><span class="o">-</span><span class="n">s</span><span class="w"> </span><span class="n">$PWD</span><span class="sr">/basm/dist/build/basm/basm /usr/local/bin/</span><span class="n">basm</span> <span class="w"> </span><span class="err">#</span><span class="w"> </span><span class="n">btest</span> <span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">git</span><span class="w"> </span><span class="n">clone</span><span class="w"> </span><span class="n">https</span><span class="o">://</span><span class="n">github</span><span class="o">.</span><span class="na">com</span><span class="sr">/briansteffens/</span><span class="n">btest</span> <span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">cd</span><span class="w"> </span><span class="n">btest</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="n">make</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="n">sudo</span><span class="w"> </span><span class="n">make</span><span class="w"> </span><span class="n">install</span><span class="w"> </span><span class="o">&amp;&amp;</span><span class="w"> </span><span class="n">cd</span><span class="w"> </span><span class="o">..</span> <span class="w"> </span><span class="err">#</span><span class="w"> </span><span class="n">run</span><span class="w"> </span><span class="n">the</span><span class="w"> </span><span class="n">tests</span> <span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">btest</span> </pre></div> http://notes.eatonphil.com/btest-a-language-agnostic-test-runner.htmlSat, 04 Aug 2018 00:00:00 +0000Writing to be readhttp://notes.eatonphil.com/writing-to-be-read.html<p>There is a common struggle in the writing and maintenance of documentation, checklists, emails, guides, etc. Each provides immense value; a document may be the key to an important process. The goal is to remove barriers -- to encourage understanding and correct application of what has been noted -- without requiring a change in the character of the reader. That is, expect reading to be difficult and people to be lazy. <strong>Don't make things harder for your reader than need be.</strong></p> <p>Ignoring imperfections in the <em>ideas</em> transcribed into writing, there are a few particular aesthetic approaches I take to (hopefully) make my notes more effective. These ideas have been influenced by readings on writing, psychology, and user experience. In particular, I recommend <a href="https://amzn.to/2rT0dsE">On Writing Well</a>, <a href="https://amzn.to/2IttNAl">Thinking Fast and Slow</a>, and <a href="https://www.nngroup.com/">Nielsen Norman</a> research.</p> <h3 id="language-correctness">Language correctness</h3><p>Spelling and grammatical correctness are low hanging fruit. They are easy to achieve. Use full sentences, use punctuation, and capitalize appropriately. But don't be a grammar stickler unreasonably; language is flexible and always changing. Don't allow anyone the opportunity to take your work less seriously by screwing up the basics.</p> <h3 id="structuring-sentences-and-paragraphs">Structuring sentences and paragraphs</h3><p>Keep your sentences short. And avoid run on sentences; they are always difficult to parse. If you use more than two commas in a sentences (aside from in lists), the sentence is terrible. Split it up. Commas are often used superfluously. Don't do that.</p> <p>Remember that if a comma separates two sentences, you can separate them into two sentences with a period instead. And if you ever have a list containing another list, separate the outer list with semi colons instead of commas to provide better differentiation.</p> <p>Keep your paragraphs short too. In primary school you may have learned to use 5-8 sentences per paragraph. Don't do so needlessly. 3-5 sentences can be perfectly appropriate. As both sentences and paragraphs get longer, they appear more intimidating and can discourage readers from continuing.</p> <div class="note"> <header class="note-header">Visually speaking</header> <p> Make your line height <a href="https://practicaltypography.com/line-spacing.html">120-145%</a> the height of the font. Increase the spacing between lines in a paragraph to make the paragraph less dense and more friendly. </p> <p> Keep contrast high. Don't put very gray (or colored) text on a white background. </p> <p> Additionally, a number of studies suggest that limiting the width of text increases readability. For best results, limit the width such that <a href="https://baymard.com/blog/line-length-readability">50-75 characters</a> appear per line of text. </p> </div><h4 id="don't-put-checklists-in-paragraphs">Don't put checklists in paragraphs</h4><p>If a document describes concrete steps that should be followed exactly and can be reasonably summarized, don't hide the steps within paragraphs of text. Instead use an ordered or unordered list to clearly enumerate the expectations. <strong>You can't expect a checklist to be followed when it is hidden within the sentences of a paragraph.</strong></p> <h3 id="structuring-sections">Structuring sections</h3><p>Any document (regardless the type) longer than 3-5 paragraphs should be broken into sub-sections with summarizing headers to aid scanning. Use the HTML <code>id</code> attribute to allow a direct link to a particular section in a long page. If the page has more than two sections or vertically flows beyond a single screen, consider adding a table of contents at the top of the page to allow the reader to find the exact section she needs.</p> <div class="note"> <header class="note-header">Visually speaking</header> <p> Don't put large headers immediately next to each other. It is disruptive to have multiple lines of large text. </p> <p> I almost completely avoid Github Markdown's h1/# tag because it is just too large and jarring relative to the rest of the text. It is often best for the flow of a Github Markdown document to stick to only h3-h4/###-#### tags for headers, using the h2/## tag for the document title. </p> </div><h3 id="in-summary">In summary</h3><p>The aesthetic flow of a document can help or hurt the experience of a reader consuming it. Good aesthetic "sense" in this regard can be boiled down to a few methods that primarily revolve around simplifying structure and facilitating the rewarding feeling of progress as a reader reads.</p> <p>Writing is difficult and takes time to evolve helpfully. The dividends are paid when process is better followed and questions are readily clarified in writing without further human intervention. It is incumbent on those writing and maintaining to organize effectively and see confusion of the reader as fault of the document, not fault of the reader. It is easier to change something yourself than to expect others to change to accommodate you.</p> http://notes.eatonphil.com/writing-to-be-read.htmlFri, 18 May 2018 00:00:00 +0000Writing a simple JSON parserhttp://notes.eatonphil.com/writing-a-simple-json-parser.html<p>Writing a JSON parser is one of the easiest ways to get familiar with parsing techniques. The format is extremely simple. It's defined recursively so you get a slight challenge compared to, say, parsing <a href="https://en.wikipedia.org/wiki/Brainfuck">Brainfuck</a>; and you probably already use JSON. Aside from that last point, parsing <a href="https://en.wikipedia.org/wiki/S-expression">S-expressions</a> for Scheme might be an even simpler task.</p> <p>If you'd just like to see the code for the library, <code>pj</code>, <a href="https://github.com/eatonphil/pj">check it out on Github</a>.</p> <h3 id="what-parsing-is-and-(typically)-is-not">What parsing is and (typically) is not</h3><p>Parsing is often broken up into two stages: lexical analysis and syntactic analysis. Lexical analysis breaks source input into the simplest decomposable elements of a language called "tokens". Syntactic analysis (often itself called "parsing") receives the list of tokens and tries to find patterns in them to meet the language being parsed.</p> <p>Parsing does not determine semantic viability of an input source. Semantic viability of an input source might include whether or not a variable is defined before being used, whether a function is called with the correct arguments, or whether a variable can be declared a second time in some scope.</p> <p class="note"> There are, of course, always variations in how people choose to parse and apply semantic rules, but I am assuming a "traditional" approach to explain the core concepts. </p><h4 id="the-json-library's-interface">The JSON library's interface</h4><p>Ultimately, there should be a <code>from_string</code> method that accepts a JSON-encoded string and returns the equivalent Python dictionary.</p> <p>For example:</p> <div class="highlight"><pre><span></span>assert_equal(from_string(&#39;{&quot;foo&quot;: 1}&#39;), {&quot;foo&quot;: 1}) </pre></div> <h3 id="lexical-analysis">Lexical analysis</h3><p>Lexical analysis breaks down an input string into tokens. Comments and whitespace are often discarded during lexical analysis so you are left with a simpler input you can search for grammatical matches during the syntactic analysis.</p> <p>Assuming a simple lexical analyzer, you might iterate over all the characters in an input string (or stream) and break them apart into fundemental, <strong>non-recursively</strong> defined language constructs such as integers, strings, and boolean literals. In particular, strings <strong>must</strong> be part of the lexical analysis because you cannot throw away whitespace without knowing that it is not part of a string.</p> <p class="note"> In a helpful lexer you keep track of the whitespace and comments you've skipped, the current line number and file you are in so that you can refer back to it at any stage in errors produced by analysis of the source. <a href="https://v8project.blogspot.com/2018/03/v8-release-66.html">The V8 Javascript engine recently became able to do reproduce the exact source code of a function.</a> This, at the very least, would need the help of a lexer to make possible. </p><h4 id="implementing-a-json-lexer">Implementing a JSON lexer</h4><p>The gist of the JSON lexer will be to iterate over the input source and try to find patterns of strings, numbers, booleans, nulls, or JSON syntax like left brackets and left braces, ultimately returning each of these elements as a list.</p> <p>Here is what the lexer should return for an example input:</p> <div class="highlight"><pre><span></span><span class="n">assert_equal</span><span class="p">(</span><span class="n">lex</span><span class="p">(</span><span class="s1">&#39;{&quot;foo&quot;: [1, 2, {&quot;bar&quot;: 2}]}&#39;</span><span class="p">),</span> <span class="p">[</span><span class="s1">&#39;{&#39;</span><span class="p">,</span> <span class="s1">&#39;foo&#39;</span><span class="p">,</span> <span class="s1">&#39;:&#39;</span><span class="p">,</span> <span class="s1">&#39;[&#39;</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="s1">&#39;,&#39;</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="s1">&#39;,&#39;</span><span class="p">,</span> <span class="s1">&#39;{&#39;</span><span class="p">,</span> <span class="s1">&#39;bar&#39;</span><span class="p">,</span> <span class="s1">&#39;:&#39;</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="s1">&#39;}&#39;</span><span class="p">,</span> <span class="s1">&#39;]&#39;</span><span class="p">,</span> <span class="s1">&#39;}&#39;</span><span class="p">])</span> </pre></div> <p>Here is what this logic might begin to look like:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">lex</span><span class="p">(</span><span class="n">string</span><span class="p">):</span> <span class="n">tokens</span> <span class="o">=</span> <span class="p">[]</span> <span class="k">while</span> <span class="nb">len</span><span class="p">(</span><span class="n">string</span><span class="p">):</span> <span class="n">json_string</span><span class="p">,</span> <span class="n">string</span> <span class="o">=</span> <span class="n">lex_string</span><span class="p">(</span><span class="n">string</span><span class="p">)</span> <span class="k">if</span> <span class="n">json_string</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span> <span class="n">tokens</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">json_string</span><span class="p">)</span> <span class="k">continue</span> <span class="c1"># TODO: lex booleans, nulls, numbers</span> <span class="k">if</span> <span class="n">string</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="ow">in</span> <span class="n">JSON_WHITESPACE</span><span class="p">:</span> <span class="n">string</span> <span class="o">=</span> <span class="n">string</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span> <span class="k">elif</span> <span class="n">string</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="ow">in</span> <span class="n">JSON_SYNTAX</span><span class="p">:</span> <span class="n">tokens</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">string</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span> <span class="n">string</span> <span class="o">=</span> <span class="n">string</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span> <span class="k">else</span><span class="p">:</span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">&#39;Unexpected character: </span><span class="si">{}</span><span class="s1">&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">string</span><span class="p">[</span><span class="mi">0</span><span class="p">]))</span> <span class="k">return</span> <span class="n">tokens</span> </pre></div> <p>The goal here is to try to match strings, numbers, booleans, and nulls and add them to the list of tokens. If none of these match, check if the character is whitespace and throw it away if so. Otherwise store it as a token if it is part of JSON syntax (like left brackets). Finally throw an exception if the character/string didn't match any of these patterns.</p> <p>Let's extend the core logic here a little bit to support all the types and add the function stubs.</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">lex_string</span><span class="p">(</span><span class="n">string</span><span class="p">):</span> <span class="k">return</span> <span class="kc">None</span><span class="p">,</span> <span class="n">string</span> <span class="k">def</span> <span class="nf">lex_number</span><span class="p">(</span><span class="n">string</span><span class="p">):</span> <span class="k">return</span> <span class="kc">None</span><span class="p">,</span> <span class="n">string</span> <span class="k">def</span> <span class="nf">lex_bool</span><span class="p">(</span><span class="n">string</span><span class="p">):</span> <span class="k">return</span> <span class="kc">None</span><span class="p">,</span> <span class="n">string</span> <span class="k">def</span> <span class="nf">lex_null</span><span class="p">(</span><span class="n">string</span><span class="p">):</span> <span class="k">return</span> <span class="kc">None</span><span class="p">,</span> <span class="n">string</span> <span class="k">def</span> <span class="nf">lex</span><span class="p">(</span><span class="n">string</span><span class="p">):</span> <span class="n">tokens</span> <span class="o">=</span> <span class="p">[]</span> <span class="k">while</span> <span class="nb">len</span><span class="p">(</span><span class="n">string</span><span class="p">):</span> <span class="n">json_string</span><span class="p">,</span> <span class="n">string</span> <span class="o">=</span> <span class="n">lex_string</span><span class="p">(</span><span class="n">string</span><span class="p">)</span> <span class="k">if</span> <span class="n">json_string</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span> <span class="n">tokens</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">json_string</span><span class="p">)</span> <span class="k">continue</span> <span class="n">json_number</span><span class="p">,</span> <span class="n">string</span> <span class="o">=</span> <span class="n">lex_number</span><span class="p">(</span><span class="n">string</span><span class="p">)</span> <span class="k">if</span> <span class="n">json_number</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span> <span class="n">tokens</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">json_number</span><span class="p">)</span> <span class="k">continue</span> <span class="n">json_bool</span><span class="p">,</span> <span class="n">string</span> <span class="o">=</span> <span class="n">lex_bool</span><span class="p">(</span><span class="n">string</span><span class="p">)</span> <span class="k">if</span> <span class="n">json_bool</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span> <span class="n">tokens</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">json_bool</span><span class="p">)</span> <span class="k">continue</span> <span class="n">json_null</span><span class="p">,</span> <span class="n">string</span> <span class="o">=</span> <span class="n">lex_null</span><span class="p">(</span><span class="n">string</span><span class="p">)</span> <span class="k">if</span> <span class="n">json_null</span> <span class="ow">is</span> <span class="ow">not</span> <span class="kc">None</span><span class="p">:</span> <span class="n">tokens</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="kc">None</span><span class="p">)</span> <span class="k">continue</span> <span class="k">if</span> <span class="n">string</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="ow">in</span> <span class="n">JSON_WHITESPACE</span><span class="p">:</span> <span class="n">string</span> <span class="o">=</span> <span class="n">string</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span> <span class="k">elif</span> <span class="n">string</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="ow">in</span> <span class="n">JSON_SYNTAX</span><span class="p">:</span> <span class="n">tokens</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">string</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span> <span class="n">string</span> <span class="o">=</span> <span class="n">string</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span> <span class="k">else</span><span class="p">:</span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">&#39;Unexpected character: </span><span class="si">{}</span><span class="s1">&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">string</span><span class="p">[</span><span class="mi">0</span><span class="p">]))</span> <span class="k">return</span> <span class="n">tokens</span> </pre></div> <h4 id="lexing-strings">Lexing strings</h4><p>For the <code>lex_string</code> function, the gist will be to check if the first character is a quote. If it is, iterate over the input string until you find an ending quote. If you don't find an initial quote, return None and the original list. If you find an initial quote and an ending quote, return the string within the quotes and the rest of the unchecked input string.</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">lex_string</span><span class="p">(</span><span class="n">string</span><span class="p">):</span> <span class="n">json_string</span> <span class="o">=</span> <span class="s1">&#39;&#39;</span> <span class="k">if</span> <span class="n">string</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">==</span> <span class="n">JSON_QUOTE</span><span class="p">:</span> <span class="n">string</span> <span class="o">=</span> <span class="n">string</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span> <span class="k">else</span><span class="p">:</span> <span class="k">return</span> <span class="kc">None</span><span class="p">,</span> <span class="n">string</span> <span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">string</span><span class="p">:</span> <span class="k">if</span> <span class="n">c</span> <span class="o">==</span> <span class="n">JSON_QUOTE</span><span class="p">:</span> <span class="k">return</span> <span class="n">json_string</span><span class="p">,</span> <span class="n">string</span><span class="p">[</span><span class="nb">len</span><span class="p">(</span><span class="n">json_string</span><span class="p">)</span><span class="o">+</span><span class="mi">1</span><span class="p">:]</span> <span class="k">else</span><span class="p">:</span> <span class="n">json_string</span> <span class="o">+=</span> <span class="n">c</span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">&#39;Expected end-of-string quote&#39;</span><span class="p">)</span> </pre></div> <h4 id="lexing-numbers">Lexing numbers</h4><p>For the <code>lex_number</code> function, the gist will be to iterate over the input until you find a character that cannot be part of a number. (This is, of course, a gross simplification, but being more accurate will be left as an exercise to the reader.) After finding a character that cannot be part of a number, either return a float or int if the characters you've accumulated number more than 0. Otherwise return None and the original string input.</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">lex_number</span><span class="p">(</span><span class="n">string</span><span class="p">):</span> <span class="n">json_number</span> <span class="o">=</span> <span class="s1">&#39;&#39;</span> <span class="n">number_characters</span> <span class="o">=</span> <span class="p">[</span><span class="nb">str</span><span class="p">(</span><span class="n">d</span><span class="p">)</span> <span class="k">for</span> <span class="n">d</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">10</span><span class="p">)]</span> <span class="o">+</span> <span class="p">[</span><span class="s1">&#39;-&#39;</span><span class="p">,</span> <span class="s1">&#39;e&#39;</span><span class="p">,</span> <span class="s1">&#39;.&#39;</span><span class="p">]</span> <span class="k">for</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">string</span><span class="p">:</span> <span class="k">if</span> <span class="n">c</span> <span class="ow">in</span> <span class="n">number_characters</span><span class="p">:</span> <span class="n">json_number</span> <span class="o">+=</span> <span class="n">c</span> <span class="k">else</span><span class="p">:</span> <span class="k">break</span> <span class="n">rest</span> <span class="o">=</span> <span class="n">string</span><span class="p">[</span><span class="nb">len</span><span class="p">(</span><span class="n">json_number</span><span class="p">):]</span> <span class="k">if</span> <span class="ow">not</span> <span class="nb">len</span><span class="p">(</span><span class="n">json_number</span><span class="p">):</span> <span class="k">return</span> <span class="kc">None</span><span class="p">,</span> <span class="n">string</span> <span class="k">if</span> <span class="s1">&#39;.&#39;</span> <span class="ow">in</span> <span class="n">json_number</span><span class="p">:</span> <span class="k">return</span> <span class="nb">float</span><span class="p">(</span><span class="n">json_number</span><span class="p">),</span> <span class="n">rest</span> <span class="k">return</span> <span class="nb">int</span><span class="p">(</span><span class="n">json_number</span><span class="p">),</span> <span class="n">rest</span> </pre></div> <h4 id="lexing-booleans-and-nulls">Lexing booleans and nulls</h4><p>Finding boolean and null values is a very simple string match.</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">lex_bool</span><span class="p">(</span><span class="n">string</span><span class="p">):</span> <span class="n">string_len</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">string</span><span class="p">)</span> <span class="k">if</span> <span class="n">string_len</span> <span class="o">&gt;=</span> <span class="n">TRUE_LEN</span> <span class="ow">and</span> \ <span class="n">string</span><span class="p">[:</span><span class="n">TRUE_LEN</span><span class="p">]</span> <span class="o">==</span> <span class="s1">&#39;true&#39;</span><span class="p">:</span> <span class="k">return</span> <span class="kc">True</span><span class="p">,</span> <span class="n">string</span><span class="p">[</span><span class="n">TRUE_LEN</span><span class="p">:]</span> <span class="k">elif</span> <span class="n">string_len</span> <span class="o">&gt;=</span> <span class="n">FALSE_LEN</span> <span class="ow">and</span> \ <span class="n">string</span><span class="p">[:</span><span class="n">FALSE_LEN</span><span class="p">]</span> <span class="o">==</span> <span class="s1">&#39;false&#39;</span><span class="p">:</span> <span class="k">return</span> <span class="kc">False</span><span class="p">,</span> <span class="n">string</span><span class="p">[</span><span class="n">FALSE_LEN</span><span class="p">:]</span> <span class="k">return</span> <span class="kc">None</span><span class="p">,</span> <span class="n">string</span> <span class="k">def</span> <span class="nf">lex_null</span><span class="p">(</span><span class="n">string</span><span class="p">):</span> <span class="n">string_len</span> <span class="o">=</span> <span class="nb">len</span><span class="p">(</span><span class="n">string</span><span class="p">)</span> <span class="k">if</span> <span class="n">string_len</span> <span class="o">&gt;=</span> <span class="n">NULL_LEN</span> <span class="ow">and</span> \ <span class="n">string</span><span class="p">[:</span><span class="n">NULL_LEN</span><span class="p">]</span> <span class="o">==</span> <span class="s1">&#39;null&#39;</span><span class="p">:</span> <span class="k">return</span> <span class="kc">True</span><span class="p">,</span> <span class="n">string</span><span class="p">[</span><span class="n">NULL_LEN</span><span class="p">:]</span> <span class="k">return</span> <span class="kc">None</span><span class="p">,</span> <span class="n">string</span> </pre></div> <p>And now the lexer code is done! See the <a href="https://github.com/eatonphil/pj/blob/master/pj/lexer.py">pj/lexer.py</a> for the code as a whole.</p> <h3 id="syntactic-analysis">Syntactic analysis</h3><p>The syntax analyzer's (basic) job is to iterate over a one-dimensional list of tokens and match groups of tokens up to pieces of the language according to the definition of the language. If, at any point during syntactic analysis, the parser cannot match the current set of tokens up to a valid grammar of the language, the parser will fail and possibly give you useful information as to what you gave, where, and what it expected from you.</p> <h4 id="implementing-a-json-parser">Implementing a JSON parser</h4><p>The gist of the JSON parser will be to iterate over the tokens received after a call to <code>lex</code> and try to match the tokens to objects, lists, or plain values.</p> <p>Here is what the parser should return for an example input:</p> <div class="highlight"><pre><span></span><span class="n">tokens</span> <span class="o">=</span> <span class="n">lex</span><span class="p">(</span><span class="s1">&#39;{&quot;foo&quot;: [1, 2, {&quot;bar&quot;: 2}]}&#39;</span><span class="p">)</span> <span class="n">assert_equal</span><span class="p">(</span><span class="n">tokens</span><span class="p">,</span> <span class="p">[</span><span class="s1">&#39;{&#39;</span><span class="p">,</span> <span class="s1">&#39;foo&#39;</span><span class="p">,</span> <span class="s1">&#39;:&#39;</span><span class="p">,</span> <span class="s1">&#39;[&#39;</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="s1">&#39;,&#39;</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="s1">&#39;{&#39;</span><span class="p">,</span> <span class="s1">&#39;bar&#39;</span><span class="p">,</span> <span class="s1">&#39;:&#39;</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="s1">&#39;}&#39;</span><span class="p">,</span> <span class="s1">&#39;]&#39;</span><span class="p">,</span> <span class="s1">&#39;}&#39;</span><span class="p">])</span> <span class="n">assert_equal</span><span class="p">(</span><span class="n">parse</span><span class="p">(</span><span class="n">tokens</span><span class="p">),</span> <span class="p">{</span><span class="s1">&#39;foo&#39;</span><span class="p">:</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="p">{</span><span class="s1">&#39;bar&#39;</span><span class="p">:</span> <span class="mi">2</span><span class="p">}]})</span> </pre></div> <p>Here is what this logic might begin to look like:</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">parse_array</span><span class="p">(</span><span class="n">tokens</span><span class="p">):</span> <span class="k">return</span> <span class="p">[],</span> <span class="n">tokens</span> <span class="k">def</span> <span class="nf">parse_object</span><span class="p">(</span><span class="n">tokens</span><span class="p">):</span> <span class="k">return</span> <span class="p">{},</span> <span class="n">tokens</span> <span class="k">def</span> <span class="nf">parse</span><span class="p">(</span><span class="n">tokens</span><span class="p">):</span> <span class="n">t</span> <span class="o">=</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="k">if</span> <span class="n">t</span> <span class="o">==</span> <span class="n">JSON_LEFTBRACKET</span><span class="p">:</span> <span class="k">return</span> <span class="n">parse_array</span><span class="p">(</span><span class="n">tokens</span><span class="p">[</span><span class="mi">1</span><span class="p">:])</span> <span class="k">elif</span> <span class="n">t</span> <span class="o">==</span> <span class="n">JSON_LEFTBRACE</span><span class="p">:</span> <span class="k">return</span> <span class="n">parse_object</span><span class="p">(</span><span class="n">tokens</span><span class="p">[</span><span class="mi">1</span><span class="p">:])</span> <span class="k">else</span><span class="p">:</span> <span class="k">return</span> <span class="n">t</span><span class="p">,</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span> </pre></div> <p>A key structural difference between this lexer and parser is that the lexer returns a one-dimensional array of tokens. Parsers are often defined recursively and returns a recursive, tree-like object. Since JSON is a data serialization format instead of a language, the parser should produce objects in Python rather than a syntax tree on which you could perform more analysis (or code generation in the case of a compiler).</p> <p>And, again, the benefit of having the lexical analysis happen independent from the parser is that both pieces of code are simpler and concerned with only specific elements.</p> <h4 id="parsing-arrays">Parsing arrays</h4><p>Parsing arrays is a matter of parsing array members and expecting a comma token between them or a right bracket indicating the end of the array.</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">parse_array</span><span class="p">(</span><span class="n">tokens</span><span class="p">):</span> <span class="n">json_array</span> <span class="o">=</span> <span class="p">[]</span> <span class="n">t</span> <span class="o">=</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="k">if</span> <span class="n">t</span> <span class="o">==</span> <span class="n">JSON_RIGHTBRACKET</span><span class="p">:</span> <span class="k">return</span> <span class="n">json_array</span><span class="p">,</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span> <span class="k">while</span> <span class="kc">True</span><span class="p">:</span> <span class="n">json</span><span class="p">,</span> <span class="n">tokens</span> <span class="o">=</span> <span class="n">parse</span><span class="p">(</span><span class="n">tokens</span><span class="p">)</span> <span class="n">json_array</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">json</span><span class="p">)</span> <span class="n">t</span> <span class="o">=</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="k">if</span> <span class="n">t</span> <span class="o">==</span> <span class="n">JSON_RIGHTBRACKET</span><span class="p">:</span> <span class="k">return</span> <span class="n">json_array</span><span class="p">,</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span> <span class="k">elif</span> <span class="n">t</span> <span class="o">!=</span> <span class="n">JSON_COMMA</span><span class="p">:</span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">&#39;Expected comma after object in array&#39;</span><span class="p">)</span> <span class="k">else</span><span class="p">:</span> <span class="n">tokens</span> <span class="o">=</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">&#39;Expected end-of-array bracket&#39;</span><span class="p">)</span> </pre></div> <h4 id="parsing-objects">Parsing objects</h4><p>Parsing objects is a matter of parsing a key-value pair internally separated by a colon and externally separated by a comma until you reach the end of the object.</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">parse_object</span><span class="p">(</span><span class="n">tokens</span><span class="p">):</span> <span class="n">json_object</span> <span class="o">=</span> <span class="p">{}</span> <span class="n">t</span> <span class="o">=</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="k">if</span> <span class="n">t</span> <span class="o">==</span> <span class="n">JSON_RIGHTBRACE</span><span class="p">:</span> <span class="k">return</span> <span class="n">json_object</span><span class="p">,</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span> <span class="k">while</span> <span class="kc">True</span><span class="p">:</span> <span class="n">json_key</span> <span class="o">=</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="k">if</span> <span class="nb">type</span><span class="p">(</span><span class="n">json_key</span><span class="p">)</span> <span class="ow">is</span> <span class="nb">str</span><span class="p">:</span> <span class="n">tokens</span> <span class="o">=</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span> <span class="k">else</span><span class="p">:</span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">&#39;Expected string key, got: </span><span class="si">{}</span><span class="s1">&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">json_key</span><span class="p">))</span> <span class="k">if</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">!=</span> <span class="n">JSON_COLON</span><span class="p">:</span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">&#39;Expected colon after key in object, got: </span><span class="si">{}</span><span class="s1">&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">t</span><span class="p">))</span> <span class="n">json_value</span><span class="p">,</span> <span class="n">tokens</span> <span class="o">=</span> <span class="n">parse</span><span class="p">(</span><span class="n">tokens</span><span class="p">[</span><span class="mi">1</span><span class="p">:])</span> <span class="n">json_object</span><span class="p">[</span><span class="n">json_key</span><span class="p">]</span> <span class="o">=</span> <span class="n">json_value</span> <span class="n">t</span> <span class="o">=</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="k">if</span> <span class="n">t</span> <span class="o">==</span> <span class="n">JSON_RIGHTBRACE</span><span class="p">:</span> <span class="k">return</span> <span class="n">json_object</span><span class="p">,</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span> <span class="k">elif</span> <span class="n">t</span> <span class="o">!=</span> <span class="n">JSON_COMMA</span><span class="p">:</span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">&#39;Expected comma after pair in object, got: </span><span class="si">{}</span><span class="s1">&#39;</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">t</span><span class="p">))</span> <span class="n">tokens</span> <span class="o">=</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span> <span class="k">raise</span> <span class="ne">Exception</span><span class="p">(</span><span class="s1">&#39;Expected end-of-object brace&#39;</span><span class="p">)</span> </pre></div> <p>And now the parser code is done! See the <a href="https://github.com/eatonphil/pj/blob/master/pj/parser.py">pj/parser.py</a> for the code as a whole.</p> <h3 id="unifying-the-library">Unifying the library</h3><p>To provide the ideal interface, create the <code>from_string</code> function wrapping the <code>lex</code> and <code>parse</code> functions.</p> <div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">from_string</span><span class="p">(</span><span class="n">string</span><span class="p">):</span> <span class="n">tokens</span> <span class="o">=</span> <span class="n">lex</span><span class="p">(</span><span class="n">string</span><span class="p">)</span> <span class="k">return</span> <span class="n">parse</span><span class="p">(</span><span class="n">tokens</span><span class="p">)[</span><span class="mi">0</span><span class="p">]</span> </pre></div> <p>And the library is complete! (ish). Check out the <a href="https://github.com/eatonphil/pj">project on Github</a> for the full implementation including basic testing setup.</p> <h3 id="appendix-a:-single-step-parsing">Appendix A: Single-step parsing</h3><p>Some parsers choose to implement lexical and syntactic analysis in one stage. For some languages this can simplify the parsing stage entirely. Or, in more powerful languages like Common Lisp, it can allow you to dynamically extend the lexer and parser in one step with <a href="https://gist.github.com/chaitanyagupta/9324402">reader macros</a>.</p> <p class="note"> I wrote this library in Python to make it more accessible to a larger audience. However, many of the techniques used are more amenable to languages with pattern matching and support for monadic operations -- like Standard ML. If you are curious what this same code would look like in Standard ML, check out the <a href="https://github.com/eatonphil/ponyo/blob/master/src/Encoding/Json.sml">JSON code in Ponyo</a>. </p><p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I wrote a short post (and a corresponding Python library) explaining lexing and parsing with JSON <a href="https://t.co/3yEZlcU6i5">https://t.co/3yEZlcU6i5</a> <a href="https://t.co/FbksvUO9aT">https://t.co/FbksvUO9aT</a> <a href="https://twitter.com/hashtag/json?src=hash&amp;ref_src=twsrc%5Etfw">#json</a> <a href="https://twitter.com/hashtag/python?src=hash&amp;ref_src=twsrc%5Etfw">#python</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/993251098931712005?ref_src=twsrc%5Etfw">May 6, 2018</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/writing-a-simple-json-parser.htmlSun, 06 May 2018 00:00:00 +0000Finishing up a FreeBSD experimenthttp://notes.eatonphil.com/finishing-up-a-freebsd-experiment.html<p>I've been using FreeBSD as my daily driver at work since December. I've successfully done my job and I've learned a hell of a lot forcing myself on CURRENT... But there's been a number of issues with it that have made it difficult to keep using, so I replaced it with Arch Linux yesterday and I no longer have those issues. This is not the first time I've forced myself to run FreeBSD and it won't be the last.</p> <h3 id="the-freebsd-setup">The FreeBSD setup</h3><p>I have a Dell Developer Edition. It employs full-disk encryption with ZFS. Not being a "disk-jockey" I cannot comment on how exhilarating an experience running ZFS is. It didn't cause me any trouble.</p> <p>It has an Intel graphics card and the display server is X. I use the <a href="https://stumpwm.github.io">StumpWM</a> window manager and the <a href="https://github.com/iwamatsu/slim">SLiM</a> login manager. <a href="https://www.jwz.org/xscreensaver/">xscreensaver</a> handles locking the screen, <a href="https://feh.finalrewind.org/">feh</a> gives me background images, <a href="https://github.com/dreamer/scrot">scrot</a> gives me screenshots, and <a href="http://recordmydesktop.sourceforge.net/about.php">recordMyDesktop</a> gives me video screen capture. This list should feel familiar to users of Arch Linux or other X-supported, bring-your-own-software operating systems/Linux distributions.</p> <h4 id="software-development">Software development</h4><p>I primarily work on a web application with Node/PostgreSQL and React/SASS. I do all of this development locally on FreeBSD. I run other components of our system in a Vagrant-managed VirtualBox virtual machine.</p> <h4 id="upgrading-the-system">Upgrading the system</h4><p>Since I'm running CURRENT, I fetch the latest commit on Subversion and rebuild the FreeBSD system (kernel + user-land) each weekend to get the new hotness. This takes somewhere between 1-4 hours. I start the process Sunday morning and come back to it after lunch. After the system is compiled and installed, I update all the packages through the package manager and deal with fallout from incompatible kernel modules that send me in a crash/reboot loop on boot.</p> <p>This is actually the part about running FreeBSD (CURRENT) I love the most. I've gotten more familiar with the development and distribution of kernel modules like the WiFi, Graphics, and VirtualBox drivers. I've learned a lot about the organization of the FreeBSD source code. And I've gotten some improvements <a href="https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=226015">merged</a> into the FreeBSD Handbook on how to debug a core dump.</p> <h3 id="issues-with-freebsd-on-my-hardware">Issues with FreeBSD on my hardware</h3><p>I installed CURRENT in December to get support for new Intel graphics drivers (which have since been backported to STABLE). The built-in Intel WiFi card is also new enough that it hadn't been backported to STABLE. My WiFi ultimately never got more than 2-4Mbps down on the same networks my Macbook Pro would get 120-250Mbps down. I even bought an older Realtek USB WiFi adapter and it fared no differently. My understanding is that this is because CURRENT turns on enough debug flags that the entire system is not really meant to be used except for by FreeBSD developers.</p> <p>It would often end up taking 10-30 seconds for a <code>git push</code> to happen. It would take minutes to pull new Docker images, etc. This (like everything else) does not mean you cannot do work on FreeBSD CURRENT, it makes it really annoying.</p> <h4 id="appendix-a---headphones">Appendix A - Headphones</h4><p>I couldn't figure out the headphone jack at all. Configuring outputs via <code>sysctl</code> and <code>device.hints</code> is either really complicated or presented in documentation really complicatedly. I posted a few times in #freebsd on Freenode and got eager assistance but ultimately couldn't get the headphone jack to produce anything without incredible distortion.</p> <p>Of course Spotify has no FreeBSD client and I didn't want to try the Linux compatiblity layer (which may have worked). I tried spoofing user agents for the Spotify web app in Chrome but couldn't find one that worked. (I still cannot get a working one on Linux either.) So I'd end up listening to Spotify on my phone, which would have been acceptable except for that the studio headphones I decided I needed were immensely under-powered by my phone.</p> <h4 id="appendix-b---yubikey">Appendix B - Yubikey</h4><p>I couldn't figure out how to give myself non-root access to my Yubikey which I <em>believe</em> is the reason I ultimately wasn't able to make any use of it. Though admittedly I don't understand a whit of GPG/PGP or Yubikey itself.</p> <h4 id="appendix-c---bhyve">Appendix C - bhyve</h4><p>I really wanted to use <a href="https://www.freebsd.org/doc/handbook/virtualization-host-bhyve.html">bhyve</a> as the hypervisor for my CentOS virtual machines instead of VirtualBox. So I spent 2-3 weekends trying to get it working as a backend for Vagrant. Unfortunately the best "supported" way of doing this is to manually mutate VirtualBox-based Vagrant boxes and that just repeatedly didn't work for me.</p> <p>When I tried using bhyve directly I couldn't get networking right. Presumably this is because NAT doesn't work well with wireless interfaces... And I hadn't put in enough weekends to understand setting up proxy rules correctly.</p> <h4 id="appendix-d---synaptics">Appendix D - Synaptics</h4><p>It is my understanding that FreeBSD has its own custom Synaptics drivers and configuration interfaces. Whether that is the case or not, the documentation is a nightmare and while I would have loved to punt to a graphical interface to prevent from fat-palming the touchpad every 30 seconds, none of the graphical configuration tools seemed to work.</p> <p>A few weeks ago I think I finally got the synaptics support <em>on</em> but I couldn't scroll or select text anymore. I also had to disable synaptics, restart X, enable synaptics, and restart X on each boot for it to successfully register the mouse. I meant to post in #freebsd on Freenode where I probably would have found a solution but :shrugs:.</p> <h4 id="appendix-e---sleep">Appendix E - Sleep</h4><p>Well sleep doesn't really work on any modern operating system.</p> <h3 id="freebsd-is-awesome">FreeBSD is awesome</h3><p>I enjoy picking on my setup, but it should be impressive that you can do real-world work on FreeBSD. If I had a 3-4 year old laptop instead of a 1-2 year old laptop, most of my issues would be solved.</p> <p>Here are some reasons to like FreeBSD.</p> <h4 id="less-competition">Less competition</h4><p>This is kind of stupid. But it's easier to find work to do (e.g. docs to fix, bugs to report, ports to add/update, drivers to test) on FreeBSD. I'm really disappointed to be back on Linux because I like being closer to the community and knowing there are ways I can contribute and learn. It's difficult to find the right combination of fending/learning for yourself and achieving a certain level of productivity.</p> <h4 id="package-management-(culture)">Package management (culture)</h4><p>Rolling packages are really important to me as a developer. When I've run Ubuntu and Debian desktops in the past, I typically built 5-15 major (to my workflow) components from source myself. This is annoying. Rolling package systems are both easier to use and easier to contribute to... The latter point may be a coincidence.</p> <p>In FreeBSD, packages are rolling and the base system (kernel + userland) is released every year or two if you run the recommended/supported "flavors" of FreeBSD (i.e. not CURRENT). If you're running CURRENT then everything is rolling.</p> <p>Packages are binary, but you can build them from source if needed.</p> <h4 id="source">Source</h4><p>FreeBSD has an older code base than Linux does but still manages to be much better organized. OpenBSD and Minix are even better organized but I don't consider them in the grouping as mainstream general-purpose operating systems like FreeBSD and Linux. Linux is an awful mess and is very intimidating, though I hope to get over that.</p> <h4 id="old-school-interfaces">Old-school interfaces</h4><p>There's no systemd so starting X is as simple as <code>startx</code> (but you can enable the login manager service to have it launch on boot). You configure your network interfaces via <code>ifconfig</code>, <code>wpa_supplicant</code>, and <code>dhclient</code>.</p> <h4 id="alternatives">Alternatives</h4><p><a href="https://www.trueos.org/">PCBSD or TrueOS</a> may be a good option for desktop users but something about the project turns me off (maybe it's the scroll-jacking website).</p> <h3 id="picking-arch-linux">Picking Arch Linux</h3><p>In any case, I decided it was time to stop waiting for <code>git push</code> to finish. I had run Gentoo at work for 3-4 months before I installed FreeBSD. But I still had nightmares of resolving dependencies during upgrades. I needed a binary package manager (not hard to find) and a rolling release system.</p> <h4 id="installing-arch-stinks">Installing Arch stinks</h4><p>Many of my old coworkers at Linode run Arch Linux at home so I've looked into it a few times. It absolutely meets my rolling release and binary packaging needs. But I've been through the installation once before (and I've been through Gentoo's) and loathed the minutes-long effort required to set up full-disk encryption. Also, systemd? :(</p> <h4 id="how-about-void-linux?">How about Void Linux?</h4><p>Void Linux looked promising and avoids systemd (which legitimately adds complexity and new tools to learn for desktop users with graphics and WiFi/DHCP networking). It has a rolling release system and binary packages, but overall didn't seem popular enough. I worried I'd be in the same boat as in Debian/Ubuntu building lots of packages myself.</p> <h4 id="what-about-arch-based-distros?">What about Arch-based distros?</h4><p>Eventually I realized <a href="http://antergos.com/">Antergos</a> and <a href="https://manjaro.org/">Manjaro</a> are two (Distrowatch-rated) popular distributions that are based on Arch and would provide me with the installer I really wanted. I read more about Manjaro and found it was pretty divergent from Arch. That didn't sound appealing. Divergent distributions like Manjaro and Mint exist to cause trouble. Antergos, on the other hand, seemed to be a thin layer around Arch including a graphical installer and its own few package repositories. It seemed easy enough to remove after the installation was finished.</p> <h3 id="antergos-linux">Antergos Linux</h3><p>I ran the Antergos installer and the first time around, my touchpad didn't work at all. I tried a USB mouse (that to be honest, may have been broken anyway) but it didn't seem to be recognized. I rebooted and my touchpad worked.</p> <p>I tried to configure WiFi using the graphical NetworkManager provided but it was super buggy. Menus kept expanding and contracting as I moused over items. And it ultimately never prompted me for a password to the locked networks around me. (It showed lock icons beside the locked networks.)</p> <p>I spent half an hour trying to configure the WiFi manually. After I got it working (and "learned" all the fun new modern tools like <code>ip</code>, <code>iw</code>, <code>dhcpcd</code>, <code>iwconfig</code>, and systemd networking), the Antergos installer would crash at the last step for some error related to not being able to update itself.</p> <p>At this point I gave up. The Antergos installer was half-baked, buggy, and was getting me nowhere.</p> <h3 id="anarchy-linux">Anarchy Linux</h3><p>Still loathe to spend a few minutes configuring disk encryption manually, I interneted until I found <a href="https://anarchy-linux.org/">Anarchy Linux</a> (which used to be Arch Anywhere).</p> <p>This installer seemed even more promising. It is a TUI installer so no need for a mouse and there are more desktop environments to pick from (including i3 and Sway) or avoid.</p> <p>It was a little concerning that Anarchy Linux also intends to be its own divergent Arch-based distribution, but in the meantime it still offers support for installing vanilla Arch.</p> <p>It worked.</p> <h3 id="life-on-arch">Life on Arch</h3><p>I copied over all my configs from my FreeBSD setup and they all worked. That's pretty nice (also speaks to the general compatibility of software between Linux and FreeBSD). StumpWM, SLiM, scrot, xscreensaver, feh, Emacs, Tmux, ssh, kubectl, font settings, keyboarding bindings, etc.</p> <p>Getting Powerline working was a little weird. The <code>powerline</code> and <code>powerline-fonts</code> packages don't seem to install patched fonts (e.g. <code>Noto Sans for Powerline</code>). I prefer to use these than the alternative of specifying multiple fonts for fallbacks because I have font settings in multiple places (e.g. .Xresources, .emacs, etc) and the syntax varies in each config. So ultimately I cloned the <code>github.com/powerline/fonts</code> repo and ran the <code>install.sh</code> script there to get the patched fonts.</p> <p>But hey, there's a Spotify client! It works! And the headphone jack just works after installing <code>alsa-utils</code> and running <code>alsamixer</code>. And my WiFi speed is 120Mbps-250Mbps down on all the right networks!</p> <p>I can live with this.</p> <h3 id="random-background">Random background</h3><p>Each time I join a new company, I try to use the change as an excuse to force myself to try different workflows and learn something new tangential to the work I actually do. I'd been a Vim and Ubuntu desktop user since highschool. In 2015, I took a break from work on the East Coast to live in a school bus in Silver City, New Mexico. I swapped out my Ubuntu and Vim dev setup for FreeBSD and Emacs. I kept GNOME 3 because I liked the asthetic. I spent 6 months with this setup forcing myself to use it as my daily-driver doing full-stack, contract web development gigs.</p> <p>In 2016, I joined Linode and took up the company Macbook Pro. I wasn't as comfortable at the time running Linux on my Macbook, but a determined coworker put Arch on his. I was still the only one running Emacs (everyone else used Vim or VS Code) for Python and React development.</p> <p>I joined Capsule8 in late 2017 and put Gentoo on my Dell Developer Edition. Most people ran Ubuntu on the Dell or macOS. I'd never used Gentoo on a desktop before but liked the systemd-optional design and similarities to FreeBSD. I ran Gentoo for 3-4 months but was constantly breaking it during upgrades, and the monthly, full-system upgrades themselves took 1-2 days. I didn't have the chops or patience to deal with it.</p> <p>So I used FreeBSD for 5 months and now I'm back on Linux.</p> http://notes.eatonphil.com/finishing-up-a-freebsd-experiment.htmlSat, 28 Apr 2018 00:00:00 +0000Book Review: ANSI Common Lisphttp://notes.eatonphil.com/book-review-ansi-common-lisp.html<h4 id="score:-4.5-/-5">Score: 4.5 / 5</h4><p>Paul Graham and his editor(s) are excellent. His prose is light and easy to follow. The only awkward component of the book's organization is that he tends to use a concept one section before explicitly introducing and defining that concept. I'm not sure yet if this is a good or bad thing.</p> <h3 id="as-a-learning-resource">As a learning resource</h3><p>Among books recommended to potential Lispers, <em>ANSI Common Lisp</em> is typically written off. Graham's style of Lisp is called "non-idiomatic". That's fair, both <em>ANSI Common Lisp</em> and <em>On Lisp</em> feature aspects of Common Lisp that lend themselves to functional programming. And as those of you who've read <em>Practical Common Lisp</em> know, Common Lisp (unlike Scheme) was not designed to be a functional programming language. Ultimately <em>ANSI Common Lisp</em> covers the same topics <em>Practical Common Lisp</em> does, if not more. But <em>ANSI Common Lisp</em> is better written, in less space, and with shorter examples.</p> <p>I'm impressed at Graham's ability to summarize. There is a graphic illustrating symbols as a structure composed of a name, a value, a function, a package, and a property list. Although other resources (books and otherwise) mention symbols as having one or more of these components, his graphic was the first representation that clicked for me. He also provides clarity about packages being namespaces for <em>names</em> (symbols) not objects or functions.</p> <p>And toward the end of the book, there is a discussion on the "instance" abstraction (relative to the class definitions themselves) being more powerful than plain "objects" that carry around methods themselves. This has been the single most useful discussion on the implementation of object-oriented constructs I've read yet.</p> <h3 id="digression-on-practical-common-lisp">Digression on Practical Common Lisp</h3><p><em>Practical Common Lisp</em> is often called the best introduction to Common Lisp. After reading both, I'd give <em>Practical Common Lisp</em> second place or call it a tie. The issue with <em>Practical Common Lisp</em> is that it takes too long to get anywhere and the practical chapters themselves are just as much a slog. And for as big as it is, <em>Practical Common Lisp</em> still doesn't include some major (potentially confusing) aspects of "modern" Common Lisp like ASDF, Quicklisp, production deployment strategies, etc.</p> <p>Even after having read <em>Practical Common Lisp</em> I wasn't really clear how to pull together all the libraries I needed to get anything real done (e.g. scripting against an HTTP API or interacting with a SQL database). This is not to say that <em>Practical Common Lisp</em> is a bad book, it is a good book. But I definitely don't recommend reading it without also reading <em>ANSI Common Lisp</em>. And regardless, there are still a few of those modern concepts neither book covers.</p> http://notes.eatonphil.com/book-review-ansi-common-lisp.htmlSun, 25 Mar 2018 00:00:00 +0000Starting a minimal Common Lisp projecthttp://notes.eatonphil.com/starting-a-minimal-common-lisp-project.html<p>If you've only vaguely heard of Lisp before or studied Scheme in school, Common Lisp is nothing like what you'd expect. While functional programming is all the rage in Scheme, Common Lisp was "expressly designed to be a real-world engineering language rather than a theoretically 'pure' language" (<a href="http://www.gigamonkeys.com/book/introduction-why-lisp.html">Practical Common Lisp</a>). Furthermore, <a href="http://sbcl.org/">SBCL</a> -- a popular implementation -- is a highly optimized compiler that is competitive with <a href="https://benchmarksgame.alioth.debian.org/u64q/lisp.html">Java</a>.</p> <h3 id="building-blocks">Building blocks</h3><p>Common Lisp symbols, imagine "first-class" variables/labels, are encapsulated in namespaces called packages. However packages don't account for organization across directories, among other things. So while packages are a part of the core Common Lisp language, the "cross-directory" organizational structure is managed by the (all-but-standard) <a href="https://github.com/fare/asdf">ASDF</a> "systems". You can think of packages as roughly similar to modules in Python whereas systems in ASDF are more like packages in Python.</p> <p>ASDF does not manage non-local dependencies. For that we use <a href="https://www.quicklisp.org/beta/">Quicklisp</a>, the defacto package manager. ASDF should come bundled with your Common Lisp installation, which I'll assume is SBCL (not that it matters). Quicklisp does not come bundled.</p> <h3 id="getting-quicklisp">Getting Quicklisp</h3><p>You can follow the notes on the Quicklisp <a href="https://www.quicklisp.org/beta/">site</a> for installation, but the basic gist is:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>curl<span class="w"> </span>-O<span class="w"> </span>https://beta.quicklisp.org/quicklisp.lisp $<span class="w"> </span>sbcl<span class="w"> </span>--load<span class="w"> </span>quicklisp.lisp ... *<span class="w"> </span><span class="o">(</span>quicklisp-quickstart:install<span class="o">)</span> ... *<span class="w"> </span>^D $<span class="w"> </span>sbcl<span class="w"> </span>--load<span class="w"> </span><span class="s2">&quot;~/quicklisp/setup.lisp&quot;</span> ... *<span class="w"> </span><span class="o">(</span>ql:add-to-init-file<span class="o">)</span> </pre></div> <h3 id="a-minimal-package">A minimal package</h3><p>Now we're ready to get started. Create a directory using the name of the library you'd like to package. For instance, I'll create a "cl-docker" directory for my Docker wrapper library. Then create a file using the same name in the directory with the ".asd" suffix:</p> <div class="highlight"><pre><span></span>$<span class="w"> </span><span class="nb">cd</span><span class="w"> </span>~/projects $<span class="w"> </span>mkdir<span class="w"> </span>cl-docker $<span class="w"> </span>touch<span class="w"> </span>cl-docker/cl-docker.asd </pre></div> <p>It is important for the ".asd" file to share the same name as the directory because ASDF will look for it in that location (by default).</p> <p>Before we get too far into packaging, let's write a function we'd like to export from this library. Edit "cl-docker/docker.lisp" (this name does not matter) and add the following:</p> <div class="highlight"><pre><span></span><span class="p">(</span><span class="nb">defun</span><span class="w"> </span><span class="nv">ps</span><span class="w"> </span><span class="p">()</span> <span class="w"> </span><span class="p">(</span><span class="k">let</span><span class="w"> </span><span class="p">((</span><span class="nv">output</span><span class="w"> </span><span class="p">(</span><span class="nv">uiop:run-program</span><span class="w"> </span><span class="o">&#39;</span><span class="p">(</span><span class="s">&quot;docker&quot;</span><span class="w"> </span><span class="s">&quot;ps&quot;</span><span class="p">)</span><span class="w"> </span><span class="ss">:output</span><span class="w"> </span><span class="ss">:string</span><span class="p">)))</span> <span class="w"> </span><span class="p">(</span><span class="nb">loop</span><span class="w"> </span><span class="nv">for</span><span class="w"> </span><span class="nv">line</span><span class="w"> </span><span class="nv">in</span><span class="w"> </span><span class="p">(</span><span class="nb">rest</span><span class="w"> </span><span class="p">(</span><span class="nv">cl-ppcre:split</span><span class="w"> </span><span class="s">&quot;(\\n+)&quot;</span><span class="w"> </span><span class="nv">output</span><span class="p">))</span> <span class="w"> </span><span class="nv">collect</span><span class="w"> </span><span class="p">(</span><span class="nv">cl-ppcre:split</span><span class="w"> </span><span class="s">&quot;(\\s\\s+)&quot;</span><span class="w"> </span><span class="nv">line</span><span class="p">))))</span> </pre></div> <p>This uses a portable library, "uiop", that ASDF exposes by default (we won't need to explicitly import this anywhere because the package is managed by ASDF). It will run the command "docker ps" in a subprocess and return the output as a string. Then we use the regex split function from the "cl-ppcre" library to split the output first into lines, take all but the first line, and split the lines up based one two or more whitespace characters.</p> <p>Next let's define the package (think module in Python) by editing "cl-docker/package.lisp" (this name also does not matter):</p> <div class="highlight"><pre><span></span><span class="p">(</span><span class="nb">defpackage</span><span class="w"> </span><span class="nv">cl-docker</span> <span class="w"> </span><span class="p">(</span><span class="ss">:use</span><span class="w"> </span><span class="nv">cl</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="ss">:import-from</span><span class="w"> </span><span class="ss">:cl-ppcre</span><span class="w"> </span><span class="ss">:split</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="ss">:export</span><span class="w"> </span><span class="ss">:ps</span><span class="p">))</span> </pre></div> <p>Here we state the package's name, say that we want to import all Common Lisp base symbols into the package, say we want to import the "split" symbol from the "cl-ppcre" package, and say we only want to export our "ps" function.</p> <p>At this point we must also declare within the "cl-docker/docker.lisp" file that it is a part of this package:</p> <div class="highlight"><pre><span></span><span class="p">(</span><span class="nb">in-package</span><span class="w"> </span><span class="ss">:cl-docker</span><span class="p">)</span> <span class="p">(</span><span class="nb">defun</span><span class="w"> </span><span class="nv">ps</span><span class="w"> </span><span class="p">()</span> <span class="w"> </span><span class="p">(</span><span class="k">let</span><span class="w"> </span><span class="p">((</span><span class="nv">output</span><span class="w"> </span><span class="p">(</span><span class="nv">uiop:run-program</span><span class="w"> </span><span class="o">&#39;</span><span class="p">(</span><span class="s">&quot;docker&quot;</span><span class="w"> </span><span class="s">&quot;ps&quot;</span><span class="p">)</span><span class="w"> </span><span class="ss">:output</span><span class="w"> </span><span class="ss">:string</span><span class="p">)))</span> <span class="w"> </span><span class="p">(</span><span class="nb">loop</span><span class="w"> </span><span class="nv">for</span><span class="w"> </span><span class="nv">line</span><span class="w"> </span><span class="nv">in</span><span class="w"> </span><span class="p">(</span><span class="nb">rest</span><span class="w"> </span><span class="p">(</span><span class="nv">cl-ppcre:split</span><span class="w"> </span><span class="s">&quot;(\\n+)&quot;</span><span class="w"> </span><span class="nv">output</span><span class="p">))</span> <span class="w"> </span><span class="nv">collect</span><span class="w"> </span><span class="p">(</span><span class="nv">cl-ppcre:split</span><span class="w"> </span><span class="s">&quot;(\\s\\s+)&quot;</span><span class="w"> </span><span class="nv">line</span><span class="p">))))</span> </pre></div> <p>Next let's define the system (ASDF-level, similar to a package in Python) in "cl-docker/cl-docker.asd":</p> <div class="highlight"><pre><span></span><span class="p">(</span><span class="nv">defsystem</span><span class="w"> </span><span class="ss">:cl-docker</span> <span class="w"> </span><span class="ss">:depends-on</span><span class="w"> </span><span class="p">(</span><span class="ss">:cl-ppcre</span><span class="p">)</span> <span class="w"> </span><span class="ss">:serial</span><span class="w"> </span><span class="no">t</span> <span class="w"> </span><span class="ss">:components</span><span class="w"> </span><span class="p">((</span><span class="ss">:file</span><span class="w"> </span><span class="s">&quot;package&quot;</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="ss">:file</span><span class="w"> </span><span class="s">&quot;docker&quot;</span><span class="p">)))</span> </pre></div> <p>This defines all the pieces of the system for ASDF: the system name, the package definition and the component of the package ("cl-docker/docker.lisp"), and tells ASDF to make the "cl-ppcre" system on disk available to us. We also tell ASDF to process the components in the order we specified (otherwise it will pick an order that may not be what we want).</p> <p>In preparation for times when we don't have the "cl-ppcre" system (or any other dependencies) on disk, we always load the system indirectly through Quicklisp (rather than directly via ASDF) so Quicklisp can fetch any missing dependencies from its repository of systems.</p> <p>But before then -- unless you put this directory in "~/common-lisp" -- you'll need to register the directory containing the directory of your system definitions so ASDF (and Quicklisp) know where to look if you ask to load this system.</p> <p>To do this, add a ".conf" file to "~/.config/common-lisp/source-registry.conf.d/" and add the following:</p> <div class="highlight"><pre><span></span><span class="p">(</span><span class="ss">:tree</span><span class="w"> </span><span class="s">&quot;~/path/to/dir/containing/system/dir&quot;</span><span class="p">)</span> </pre></div> <p>So if you had a repo called "cl-docker" in your "~/projects" directory that contained the "cl-docker" directory we previously created (that, in turn, contains the "cl-docker.asd", "package.lisp", and "docker.lisp" files) then you might create "~/.config/common-lisp/source-registry.conf.d/1-cl-docker.conf" and add:</p> <div class="highlight"><pre><span></span><span class="p">(</span><span class="ss">:tree</span><span class="w"> </span><span class="s">&quot;~/projects/cl-docker&quot;</span><span class="p">)</span> </pre></div> <h4 id="using-the-system">Using the system</h4><p>Now you can use the library from anywhere on your computer. Enter a Common Lisp REPL and tell Quicklisp to load the system (and download any non-local dependencies):</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>sbcl ... *<span class="w"> </span><span class="o">(</span>ql:quickload<span class="w"> </span><span class="s2">&quot;cl-docker&quot;</span><span class="o">)</span> To<span class="w"> </span>load<span class="w"> </span><span class="s2">&quot;cl-docker&quot;</span>: <span class="w"> </span>Load<span class="w"> </span><span class="m">1</span><span class="w"> </span>ASDF<span class="w"> </span>system: <span class="w"> </span>cl-docker <span class="p">;</span><span class="w"> </span>Loading<span class="w"> </span><span class="s2">&quot;cl-docker&quot;</span> .................................................. <span class="o">[</span>package<span class="w"> </span>cl-docker<span class="o">]</span> <span class="o">(</span><span class="s2">&quot;cl-docker&quot;</span><span class="o">)</span> *<span class="w"> </span><span class="o">(</span>cl-docker:ps<span class="o">)</span> </pre></div> <p>And that's it!</p> <p>For the complete source of this example package, check out this <a href="https://gist.github.com/eatonphil/59cdfeb4826c7a12a07d7055f6817a56">Gist</a>.</p> <h3 id="in-conclusion">In conclusion</h3><p>Common Lisp is easy to work with, the packages are many and mature. Configuring an ASDF package is even simpler than configuring a Python "setup.py". I didn't demonstrate pinning versions of dependencies in ASDF, but <a href="https://stackoverflow.com/a/21663603/1507139">of course</a> you can do that too. If any of this -- as simple as it is -- seems tedious, you can also use Zach Beane's (creator of Quicklisp) <a href="http://xach.livejournal.com/278047.html">quickproject</a> tool to build out the structure for you.</p> <h3 id="resources-for-common-lisp">Resources for Common Lisp</h3><p>You must read <a href="http://www.gigamonkeys.com/book/">Practical Common Lisp</a>. It is freely available online. It is one of the best resources I keep referring to in dealing with simple issues (as a new Lisper, I stumble on a lot of simple issues).</p> <p>Paul Graham's <a href="http://www.paulgraham.com/onlisp.html">On Lisp</a> is also a must-read when you want to get a better understanding of macros in Lisp. It will help you out with macros in Scheme too. This book is freely available online, but out of print physically. I sent <a href="https://www.lulu.com/">Lulu</a> the PDF and I received my physical copy for under $20 (including shipping).</p> <p>I'm currently making my way through <a href="http://www.cs.cmu.edu/Groups/AI/html/cltl/cltl2.html">Common Lisp the Language, 2nd Edition</a> which I believe is also freely available online. However I don't really recommend this unless you are interested in implementing Common Lisp or are dying to learn the standard library (not a bad idea).</p> <p>Finally, Peter Norvig's <a href="https://github.com/norvig/paip-lisp">Paradigms of Artificial Intelligence Programming</a> just recently became freely available online. I haven't yet read it but I'm queuing it up. Don't let the title scare you, apparantly it is primarily considered a practical guide to Common Lisp around old-school/classical AI that isn't supposed to encumber.</p> <p class="note"> It was <a href="https://twitter.com/HexstreamSoft/status/971419419862847494">pointed out</a> on Twitter that Paul Graham's <a href="http://www.paulgraham.com/acl.html">ANSI Common Lisp</a> and the <a href="http://www.lispworks.com/documentation/lw70/CLHS/Front/Contents.htm">CLHS</a> are probably better resources for the Common Lisp that exists today than Common Lisp the Language 2. CLtL2 is pre-standard. </p><p>Additionally, the <a href="http://lispcookbook.github.io/cl-cookbook/">Common Lisp Cookbook</a> is a great resource for Common Lisp recipes. It's been around since 2004 (on Sourceforge) but has been pretty active recently and has been revived on Github pages.</p> <h3 id="on-scheme">On Scheme</h3><p>I've done one or two unremarkable web prototypes in <a href="https://www.call-cc.org/">Chicken Scheme</a>, an R5RS/R7RS Scheme implementation. I don't think Chicken Scheme is the best bet for the web (I'm mostly biased to this topic) because it has no native-thread support and there are lighter interpreters out there that are easier to embed (e.g. in nginx). Chicken Scheme's "niche" is being a generally high-quality implementation with a great <a href="http://wiki.call-cc.org/chicken-projects/egg-index-4.html">collection of 3rd-party libraries</a>, but it is also not the <a href="https://ecraven.github.io/r7rs-benchmarks/">fastest</a> Scheme you could choose.</p> <p>I've worked on a larger web prototype -- a Github issue reporting app -- in <a href="https://racket-lang.org/">Racket</a>, a derivative of Scheme R6RS. And I've blogged <a href="http://notes.eatonphil.com/walking-through-a-basic-racket-web-service.html">favorably</a> about Racket. It is a <a href="https://ecraven.github.io/r7rs-benchmarks/">high-performance</a> interpreter with a JIT compiler, has thread support, and is also well known for its collection of <a href="https://pkgs.racket-lang.org/">3rd-party libaries</a>. However the Racket ecosystem <a href="https://fare.livejournal.com/188429.html">suffers</a> from the same issues Haskell's does: libraries and bindings are primarily proof-of-concept only; missing documentation, tests and use. Trying to render "templatized" HTML (like Jinja allows for in Flask) without using S-exp-based syntax was a nightmare. (Read: there's space for someone to write a good string templating library.)</p> <h4 id="sorry,-racket">Sorry, Racket</h4><p>Last point on Racket (because it really is worth looking into), debugging in that Github issue project was not fun. The backtraces were mostly useless. Naively I assume this may have to do with the way Racket optimizes and rewrites functions. I was often left with zero context to find and correct my errors. But it could very well be I was making poor use of Racket.</p> <h4 id="on-the-other-hand">On the other hand</h4><p>Common Lisp (its implementations and ecosystem) seems more robust and developed. SBCL, with it's great performance and native-thread support, is a promising candidate for backend web development.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">I wrote a post on putting together a Common Lisp project. It&#39;s easy! I also included some of my favorite CL books and a digression on Scheme. <a href="https://t.co/2LEDoFnAjk">https://t.co/2LEDoFnAjk</a> <a href="https://twitter.com/hashtag/commonlisp?src=hash&amp;ref_src=twsrc%5Etfw">#commonlisp</a> <a href="https://twitter.com/hashtag/lisp?src=hash&amp;ref_src=twsrc%5Etfw">#lisp</a> <a href="https://twitter.com/hashtag/scheme?src=hash&amp;ref_src=twsrc%5Etfw">#scheme</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/971398435856371712?ref_src=twsrc%5Etfw">March 7, 2018</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/starting-a-minimal-common-lisp-project.htmlMon, 05 Mar 2018 00:00:00 +0000Interview with the D Language Blog: BSDSchemehttp://notes.eatonphil.com/project-highlight-bsdscheme.html<head> <meta http-equiv="refresh" content="4;URL='https://dlang.org/blog/2018/01/20/project-highlight-bsdscheme/'" /> </head><p>This is an external post of mine. Click <a href="https://dlang.org/blog/2018/01/20/project-highlight-bsdscheme/">here</a> if you are not redirected.</p> http://notes.eatonphil.com/project-highlight-bsdscheme.htmlSat, 20 Jan 2018 00:00:00 +0000First few hurdles writing a Scheme interpreterhttp://notes.eatonphil.com/first-few-hurdles-writing-a-scheme-interpreter.html<p>I started working on <a href="https://github.com/eatonphil/bsdscheme">BSDScheme</a> last October, inspired to get back into language implementation after my coworker built <a href="https://github.com/briansteffens/bshift">bshift</a>, a compiler for a C-like language. BSDScheme is an interpreter for a (currently small subset of) Scheme written in D. It implements a few substantial primitive <a href="https://github.com/eatonphil/bsdscheme/blob/c49bb14182f04682a5cda4dd224b853b4fc92e92/src/runtime.d#L422">functions</a> (in under 1000 LoC!). It uses the same test framework bshift uses, <a href="https://github.com/briansteffens/btest">btest</a>. I'm going to expand here on some notes I wrote in a <a href="https://www.reddit.com/r/scheme/comments/7nvd1y/my_small_scheme_implementation_in_d/">post</a> on Reddit on some issues I faced during these first few months developing BSDSCheme.</p> <p>Before I get too far, here is a simple exponent function running in BSDScheme. It demonstates a few of the basic builtin primitives and also integers being upgraded to D's <a href="https://dlang.org/phobos/std_bigint.html">std.bigint</a> when an integer operation produces an integer unable to fit in 64 bits. (See the <a href="https://github.com/eatonphil/bsdscheme/blob/b202e8b5a24fe4281a06e39241f2be3cd51720fc/src/runtime.d#L99">times</a> and <a href="https://github.com/eatonphil/bsdscheme/blob/b202e8b5a24fe4281a06e39241f2be3cd51720fc/src/runtime.d#L63">plus</a> guards for details; see the <a href="https://github.com/eatonphil/bsdscheme/tree/master/examples">examples</a> directory for other examples.)</p> <div class="highlight"><pre><span></span>$<span class="w"> </span>cat<span class="w"> </span>examples/recursion.scm <span class="o">(</span>define<span class="w"> </span><span class="o">(</span>exp<span class="w"> </span>base<span class="w"> </span>pow<span class="o">)</span> <span class="w"> </span><span class="o">(</span><span class="k">if</span><span class="w"> </span><span class="o">(=</span><span class="w"> </span>pow<span class="w"> </span><span class="m">0</span><span class="o">)</span> <span class="w"> </span><span class="m">1</span> <span class="w"> </span><span class="o">(</span>*<span class="w"> </span>base<span class="w"> </span><span class="o">(</span>exp<span class="w"> </span>base<span class="w"> </span><span class="o">(</span>-<span class="w"> </span>pow<span class="w"> </span><span class="m">1</span><span class="o">)))))</span> <span class="o">(</span>display<span class="w"> </span><span class="o">(</span>exp<span class="w"> </span><span class="m">2</span><span class="w"> </span><span class="m">64</span><span class="o">))</span> <span class="o">(</span>newline<span class="o">)</span> $<span class="w"> </span>./bin/bsdscheme<span class="w"> </span>examples/exp.scm <span class="m">18446744073709551616</span> </pre></div> <p>The first big correction I made was to the way values are represented in memory. I originally implemented BSDScheme's value representation as a <a href="https://github.com/eatonphil/bsdscheme/pull/3/files#diff-653d5ccdaa287f13a3b2d964da52ab4aL284">struct</a> with a pointer to each possible value type. This design was simple to begin with but space-inefficient. I modelled a <a href="https://github.com/eatonphil/bsdscheme/pull/3">redesign</a> after the <a href="https://wiki.call-cc.org/man/4/Data%20representation">Chicken Scheme</a> data representation. It uses a struct with <a href="https://github.com/eatonphil/bsdscheme/pull/3/files#diff-c586618fe7ea7c64340046e89fd82621R14">two fields</a>, header and data. Both fields are word-size integers (currently hard-coded as 64 bits). The header stores type and length information and the data stores data.</p> <p>In this representation, simple types (integers &lt; 2^63, booleans, characters, etc.) take up only 128 bits. The integers, booleans, etc. are placed directly into the 64 bit data field. Other types (larger integers, strings, functions, etc) use the data field to store a pointer to memory allocated in the heap. Getting the conversion of these complex types right was the trickiest part of this data representation effort... lots of void-pointer conversions.</p> <p>The next big fix I made was to simplify the way generic functions dealt with their arguments. Originally I passed each function its arguments un-evaluated and left it up to each function to evaluate its arguments before operating on them. While there was nothing intrinsically wrong with this method, it was overly complicated and bug-prone. I refactored the builtin functions into two groups: <a href="https://github.com/eatonphil/bsdscheme/blob/c49bb14182f04682a5cda4dd224b853b4fc92e92/src/runtime.d#L422">normal</a> functions and <a href="https://github.com/eatonphil/bsdscheme/blob/c3286df73a32da657e780db8f33e845c9f806a9d/src/runtime.d#L435">special</a> functions. Normal function arguments are <a href="https://github.com/eatonphil/bsdscheme/blob/c3286df73a32da657e780db8f33e845c9f806a9d/src/runtime.d#L399">evaluated</a> before sending the arguments S-expression to the function. Special functions receive the arguments S-expression verbatim so they can decide what / when to evaluate.</p> <p>The last issue I'll talk about in this post was dealing with the AST representation. When I started out, the easiest way to get things working was to have an AST representation completely separate from the representation of BSDScheme values. This won't get you far in Scheme. In order to (eventually) support macros (and in the meantime support eval), the AST representation would have to make use of the value representation. This was the most complicated and confusing issue so far in BSDScheme. With the switch to recursive data structures, it was hard to know if an error occurred because I parsed incorrectly, or recursed over what I parsed incorrectly, or even if I was printing out what I parsed incorrectly. After some embarrassing pain, I got all the <a href="https://github.com/eatonphil/bsdscheme/pull/5">pieces in place</a> after a month and it set me up to easily support converting my original interpret function into a generic eval function that I could expose to the language like any other special function.</p> <p>One frustrating side-effect of this AST conversion is that since the parsing stage builds out trees using the internal value representation, the parsing stage is tied to the interpreter. From what I can tell, this basically means I have to revert back to some intermediate AST representation or throw away the parser to support a compiler backend.</p> <p>Next steps in BSDScheme include converting all the examples into tests, combining the needlessly split out lexing and parsing stage into a single read function that can be exposed into the language, fleshing out R7RS library support, and looking more into LLVM as a backend.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Wrote a full post on the first few hurdles faced writing a Scheme interpreter in D <a href="https://t.co/Cyjy7pk3OB">https://t.co/Cyjy7pk3OB</a> <a href="https://twitter.com/hashtag/scheme?src=hash&amp;ref_src=twsrc%5Etfw">#scheme</a> <a href="https://twitter.com/hashtag/schemelang?src=hash&amp;ref_src=twsrc%5Etfw">#schemelang</a> <a href="https://twitter.com/hashtag/lisp?src=hash&amp;ref_src=twsrc%5Etfw">#lisp</a> <a href="https://twitter.com/hashtag/dlang?src=hash&amp;ref_src=twsrc%5Etfw">#dlang</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/951091952740651008?ref_src=twsrc%5Etfw">January 10, 2018</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/first-few-hurdles-writing-a-scheme-interpreter.htmlWed, 10 Jan 2018 00:00:00 +0000Deploying FreeBSD on Linode unattended in minuteshttp://notes.eatonphil.com/deploying-freebsd-on-linode-unattended-in-minutes.html<p>I became a FreeBSD user over 2 years ago when I wanted to see what all the fuss was about. I swapped my y410p dual-booting Windows / Ubuntu with FreeBSD running Gnome 3. I learned a lot during the transition and came to appreciate FreeBSD as a user. I soon began running FreeBSD as my OS of choice on cloud servers I managed. So naturally, when I started working at Linode a year ago I wanted to run FreeBSD servers on Linode too.</p> <p>Linode is a great platform for running random unofficial images because you have much control over the configuration. I followed <a href="https://www.linode.com/docs/tools-reference/custom-kernels-distros/install-freebsd-on-linode/">existing</a> <a href="https://forum.linode.com/viewtopic.php?f=20&amp;t=12080">guides</a> closely and was soon able to get a number of operating systems running on Linodes by installing them manually: FreeBSD, OpenBSD, NetBSD, Minix3, and SmartOS to date.</p> <p>Unofficial images come at a cost though. In particular, I became frustrated having to reinstall using the installer every time I managed to trash the disk. So over the past year, I spent time trying to understand the automated installation processes across different operating systems and Linux distributions.</p> <p>Unattended installations are tough. The methods for doing them differ wildly. On RedHat, Fedora, and CentOS there is <a href="https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/5/html/installation_guide/ch-kickstart2">Kickstart</a>. On Debian and Ubuntu there is <a href="https://wiki.debian.org/DebianInstaller/Preseed">preseeding</a>. Gentoo, Arch, and FreeBSD don't particularly have a framework for unattended installs, but the entire installation process is well-documented and inherently scriptable (if you put in the effort). OpenBSD has <a href="http://man.openbsd.org/OpenBSD-6.0/man8/autoinstall.8">autoinstall</a>. Trying to understand each and every one of these potential installation methods was pretty defeating for getting started on a side-project.</p> <p>A few weeks ago, I finally had the silly revelation that I didn't need to script the installation process -- at least initially. I only had to have working images available somewhere that could be copied to new Linodes. Some OSs / distributions may provide these images, but there is no guarantee that they exist or work. If I tested and hosted them for Linodes, anyone could easily run their own copy.</p> <p>I began by running the installation process as normal for FreeBSD. After the disk had FreeBSD installed on it, I rebooted into <a href="https://www.linode.com/docs/troubleshooting/rescue-and-rebuild/">Finnix</a>, <a href="https://wiki.archlinux.org/index.php/disk_cloning#Create_disk_image">made a compressed disk image</a>, and transferred it to an "image host" (another Linode in Fremont running an FTP server). Then I tested the reversal process manually to make sure a new Linode could grab the image, dd it to a disk, reboot and have a working filesystem and networking. (This transfer occurs over private networking to reduce bandwidth costs and thus limits Linode creation to the datacenter of the image host, Fremont.)</p> <p>Then it was time to script the process. I looked into the existing Linode API client wrappers and noticed none of them were documented. So I took a day to write and document a good part of a <a href="https://github.com/eatonphil/python3-linode_api3">new Linode Python client</a>.</p> <p>I got to work and out came the <a href="https://github.com/eatonphil/linode_deploy_experimental">linode-deploy-experimental</a> script. To run this script, you'll need an <a href="https://www.linode.com/docs/platform/api/api-key/">API token</a>. This script will allow you to deploy from the hosted images (which now include FreeBSD 11.0 and OpenBSD 6.0). Follow the example line in the git repo and you'll have a Linode running OpenBSD or FreeBSD in minutes.</p> <p>Clearly there's a lot of work to do on both this script and on the images:</p> <ul> <li>Fremont datacenter has the only image host.</li> <li>The script does not change the default password: "password123". You'll want to change this immediately.</li> <li>The script does not automatically grow the file system after install.</li> <li>The TTY config for these images currently requires you to use Glish instead of Weblish.</li> <li>And <a href="https://github.com/eatonphil/linode_deploy_experimental/issues">more</a>.</li> </ul> <p>Even if many of these issues do get sorted out (I assume they will), keep in mind that these are unofficial, unsupported images. Some things will probably never work: backups, password reset, etc. If you need help, you are probably limited to community support. You can also find me with any questions (peaton on OFTC). But for me this is at least a slight improvement on having to run through the install process every time I need a new FreeBSD Linode.</p> <p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Deploy FreeBSD and OpenBSD unattended on Linode <a href="https://t.co/j5A46ROqNM">https://t.co/j5A46ROqNM</a> <a href="https://t.co/HSqrIvBMFj">https://t.co/HSqrIvBMFj</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/840736360864591872?ref_src=twsrc%5Etfw">March 12, 2017</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/deploying-freebsd-on-linode-unattended-in-minutes.htmlSat, 11 Mar 2017 00:00:00 +0000Walking through a basic Racket web servicehttp://notes.eatonphil.com/walking-through-a-basic-racket-web-service.html<p>Racket is an impressive language and ecosystem. Compared to Python, Racket (an evolution of Scheme <a href="https://en.wikipedia.org/wiki/Scheme_(programming_language)">R5RS</a> is three years younger. It is as concise and expressive as Python but with much more reasonable syntax and semantics. Racket is also faster in many cases due in part to:</p> <ul> <li><a href="https://docs.racket-lang.org/guide/performance.html#%28part._.J.I.T%29">JIT compilation</a> on x86 platforms</li> <li>support for both <a href="https://docs.racket-lang.org/reference/threads.html">concurrency</a> and <a href="https://docs.racket-lang.org/reference/places.html">parallelism</a></li> <li>support for <a href="https://docs.racket-lang.org/ts-guide/optimization.html">optimizing</a> statically-typed code</li> </ul> <p>Furthermore, the built-in web server libraries <strong>and</strong> database drivers for MySQL and PostgreSQL are fully asynchronous. This last bit drove me here from <a href="https://www.playframework.com/documentation/2.6.x/ThreadPools#Knowing-when-you-are-blocking">Play / Akka</a>. (But strong reservations about the complexity of Scala and the ugliness of Play in Java helped too.)</p> <p>With this motivation in mind, I'm going to break down the simple web service <a href="https://docs.racket-lang.org/web-server/stateless.html#%28part._stateless-example%29">example</a> provided in the Racket manuals. If you don't see the following code in the linked page immediately, scroll down a bit.</p> <div class="highlight"><pre><span></span><span class="o">#</span><span class="nv">lang</span><span class="w"> </span><span class="nv">web-server</span> <span class="p">(</span><span class="nf">require</span><span class="w"> </span><span class="nv">web-server/http</span><span class="p">)</span> <span class="p">(</span><span class="nb">provide</span><span class="w"> </span><span class="nv">interface-version</span><span class="w"> </span><span class="nv">stuffer</span><span class="w"> </span><span class="nv">start</span><span class="p">)</span> <span class="p">(</span><span class="k">define</span><span class="w"> </span><span class="nv">interface-version</span><span class="w"> </span><span class="ss">&#39;stateless</span><span class="p">)</span> <span class="p">(</span><span class="k">define</span><span class="w"> </span><span class="nv">stuffer</span> <span class="w"> </span><span class="p">(</span><span class="nf">stuffer-chain</span> <span class="w"> </span><span class="nv">serialize-stuffer</span> <span class="w"> </span><span class="p">(</span><span class="nf">md5-stuffer</span><span class="w"> </span><span class="p">(</span><span class="nf">build-path</span><span class="w"> </span><span class="p">(</span><span class="nf">find-system-path</span><span class="w"> </span><span class="ss">&#39;home-dir</span><span class="p">)</span><span class="w"> </span><span class="s">&quot;.urls&quot;</span><span class="p">))))</span> <span class="p">(</span><span class="k">define</span><span class="w"> </span><span class="p">(</span><span class="nf">start</span><span class="w"> </span><span class="nv">req</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="nf">response/xexpr</span> <span class="w"> </span><span class="o">`</span><span class="p">(</span><span class="nf">html</span><span class="w"> </span><span class="p">(</span><span class="nf">body</span><span class="w"> </span><span class="p">(</span><span class="nf">h2</span><span class="w"> </span><span class="s">&quot;Look ma, no state!&quot;</span><span class="p">)))))</span> </pre></div> <p>First we notice the #lang declaration. Racket libraries love to make new "languages". These languages can include some entirely new syntax (like the <a href="http://docs.racket-lang.org/algol60/">Algol language implementation</a>) or can simply include a summary collection of libraries and alternative program entrypoints (such as this web-server language provides). So the first thing we'll do to really understand this code is to throw out the custom language. And while we're at it, we'll throw out all typical imports provided by the <a href="http://docs.racket-lang.org/reference/">default racket language</a> and use the racket/base language instead. This will help us get a better understanding of the Racket libraries and the functions we're using from these libraries.</p> <p>While we're throwing the language away, we notice the paragraphs just below that <a href="https://docs.racket-lang.org/web-server/stateless.html#%28part._stateless-example%29">original example</a> in the manual. It mentions that the web-server language also imports a bunch of modules. We can discover which of these modules we actually need by searching in the Racket manual for functions we've used. For instance, <a href="https://docs.racket-lang.org/search/index.html?q=response%2Fxexpr">searching</a> for "response/xexpr" tells us it's in the <a href="https://docs.racket-lang.org/web-server/http.html#%28part._xexpr%29">web-server/http/xexpr</a> module. We'll import the modules we need using the "prefix-in" form to make function-module connections explicit.</p> <div class="highlight"><pre><span></span><span class="o">#</span><span class="nv">lang</span><span class="w"> </span><span class="nv">racket/base</span> <span class="p">(</span><span class="nf">require</span><span class="w"> </span><span class="p">(</span><span class="nf">prefix-in</span><span class="w"> </span><span class="nv">xexpr:</span><span class="w"> </span><span class="nv">web-server/http/xexpr</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="nf">prefix-in</span><span class="w"> </span><span class="nv">hash:</span><span class="w"> </span><span class="nv">web-server/stuffers/hash</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="nf">prefix-in</span><span class="w"> </span><span class="nv">stuffer:</span><span class="w"> </span><span class="nv">web-server/stuffers/stuffer</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="nf">prefix-in</span><span class="w"> </span><span class="nv">serialize:</span><span class="w"> </span><span class="nv">web-server/stuffers/serialize</span><span class="p">))</span> <span class="p">(</span><span class="nb">provide</span><span class="w"> </span><span class="nv">interface-version</span><span class="w"> </span><span class="nv">stuffer</span><span class="w"> </span><span class="nv">start</span><span class="p">)</span> <span class="p">(</span><span class="k">define</span><span class="w"> </span><span class="nv">interface-version</span><span class="w"> </span><span class="ss">&#39;stateless</span><span class="p">)</span> <span class="p">(</span><span class="k">define</span><span class="w"> </span><span class="nv">stuffer</span> <span class="w"> </span><span class="p">(</span><span class="nf">stuffer:stuffer-chain</span> <span class="w"> </span><span class="nv">serialize:serialize-stuffer</span> <span class="w"> </span><span class="p">(</span><span class="nf">hash:md5-stuffer</span><span class="w"> </span><span class="p">(</span><span class="nf">build-path</span><span class="w"> </span><span class="p">(</span><span class="nf">find-system-path</span><span class="w"> </span><span class="ss">&#39;home-dir</span><span class="p">)</span><span class="w"> </span><span class="s">&quot;.urls&quot;</span><span class="p">))))</span> <span class="p">(</span><span class="k">define</span><span class="w"> </span><span class="p">(</span><span class="nf">start</span><span class="w"> </span><span class="nv">req</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="nf">xexpr:response/xexpr</span> <span class="w"> </span><span class="o">`</span><span class="p">(</span><span class="nf">html</span><span class="w"> </span><span class="p">(</span><span class="nf">body</span><span class="w"> </span><span class="p">(</span><span class="nf">h2</span><span class="w"> </span><span class="s">&quot;Look ma, no state!&quot;</span><span class="p">)))))</span> </pre></div> <p>Now we've got something that is a little less magical. We can run this file by calling it: "racket server.rkt". But nothing happens. This is because the web-server language would start the service itself using the exported variables we provided. So we're going to have to figure out what underlying function calls "start" and call it ourselves. Unfortunately searching for "start" in the manual search field yields nothing relevant. So we Google "racket web server start". Down the page on the second <a href="https://docs.racket-lang.org/web-server/run.html">search result</a> we notice an <a href="https://docs.racket-lang.org/web-server/run.html#%28part._.Examples%29">example</a> using the serve/servlet function to register the start function. This is our in.</p> <div class="highlight"><pre><span></span><span class="o">#</span><span class="nv">lang</span><span class="w"> </span><span class="nv">racket/base</span> <span class="p">(</span><span class="nf">require</span><span class="w"> </span><span class="p">(</span><span class="nf">prefix-in</span><span class="w"> </span><span class="nv">xexpr:</span><span class="w"> </span><span class="nv">web-server/http/xexpr</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="nf">prefix-in</span><span class="w"> </span><span class="nv">hash:</span><span class="w"> </span><span class="nv">web-server/stuffers/hash</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="nf">prefix-in</span><span class="w"> </span><span class="nv">stuffer:</span><span class="w"> </span><span class="nv">web-server/stuffers/stuffer</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="nf">prefix-in</span><span class="w"> </span><span class="nv">serialize:</span><span class="w"> </span><span class="nv">web-server/stuffers/serialize</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="nf">prefix-in</span><span class="w"> </span><span class="nv">servlet-env:</span><span class="w"> </span><span class="nv">web-server/servlet-env</span><span class="p">))</span> <span class="p">(</span><span class="nb">provide</span><span class="w"> </span><span class="nv">interface-version</span><span class="w"> </span><span class="nv">stuffer</span><span class="w"> </span><span class="nv">start</span><span class="p">)</span> <span class="p">(</span><span class="k">define</span><span class="w"> </span><span class="nv">interface-version</span><span class="w"> </span><span class="ss">&#39;stateless</span><span class="p">)</span> <span class="p">(</span><span class="k">define</span><span class="w"> </span><span class="nv">stuffer</span> <span class="w"> </span><span class="p">(</span><span class="nf">stuffer:stuffer-chain</span> <span class="w"> </span><span class="nv">serialize:serialize-stuffer</span> <span class="w"> </span><span class="p">(</span><span class="nf">hash:md5-stuffer</span><span class="w"> </span><span class="p">(</span><span class="nf">build-path</span><span class="w"> </span><span class="p">(</span><span class="nf">find-system-path</span><span class="w"> </span><span class="ss">&#39;home-dir</span><span class="p">)</span><span class="w"> </span><span class="s">&quot;.urls&quot;</span><span class="p">))))</span> <span class="p">(</span><span class="k">define</span><span class="w"> </span><span class="p">(</span><span class="nf">start</span><span class="w"> </span><span class="nv">req</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="nf">xexpr:response/xexpr</span> <span class="w"> </span><span class="o">`</span><span class="p">(</span><span class="nf">html</span><span class="w"> </span><span class="p">(</span><span class="nf">body</span><span class="w"> </span><span class="p">(</span><span class="nf">h2</span><span class="w"> </span><span class="s">&quot;Look ma, no state!&quot;</span><span class="p">)))))</span> <span class="p">(</span><span class="nf">servlet-env:serve/servlet</span><span class="w"> </span><span class="nv">start</span><span class="p">)</span> </pre></div> <p>Run this version and it works! We are directed to a browser with our HTML. But we should clean this code up a bit. We no longer need to export anything so we'll drop the provide line. We aren't even using the interface-version and stuffer code. Things seem to be fine without them, so we'll drop those too. Also, looking at the serve/servlet <a href="https://docs.racket-lang.org/web-server/run.html#%28def._%28%28lib._web-server%2Fservlet-env..rkt%29._serve%2Fservlet%29%29">documentation</a> we notice some other nice arguments we can tack on.</p> <div class="highlight"><pre><span></span><span class="o">#</span><span class="nv">lang</span><span class="w"> </span><span class="nv">racket/base</span> <span class="p">(</span><span class="nf">require</span><span class="w"> </span><span class="p">(</span><span class="nf">prefix-in</span><span class="w"> </span><span class="nv">xexpr:</span><span class="w"> </span><span class="nv">web-server/http/xexpr</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="nf">prefix-in</span><span class="w"> </span><span class="nv">servlet-env:</span><span class="w"> </span><span class="nv">web-server/servlet-env</span><span class="p">))</span> <span class="p">(</span><span class="k">define</span><span class="w"> </span><span class="p">(</span><span class="nf">start</span><span class="w"> </span><span class="nv">req</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="nf">xexpr:response/xexpr</span> <span class="w"> </span><span class="o">`</span><span class="p">(</span><span class="nf">html</span><span class="w"> </span><span class="p">(</span><span class="nf">body</span><span class="w"> </span><span class="p">(</span><span class="nf">h2</span><span class="w"> </span><span class="s">&quot;Look ma, no state!&quot;</span><span class="p">)))))</span> <span class="p">(</span><span class="nf">servlet-env:serve/servlet</span> <span class="w"> </span><span class="nv">start</span> <span class="w"> </span><span class="kd">#:servlet-path</span><span class="w"> </span><span class="s">&quot;/&quot;</span> <span class="w"> </span><span class="kd">#:servlet-regexp</span><span class="w"> </span><span class="nv">rx</span><span class="s">&quot;&quot;</span> <span class="w"> </span><span class="kd">#:stateless?</span><span class="w"> </span><span class="no">#t</span><span class="p">)</span> </pre></div> <p>Ah, that's much cleaner. When you run this code, you will no longer be directed to the /servlets/standalone.rkt path but to the site root -- set by the #:servlet-path optional variable. Also, every other path you try to reach such as /foobar will successfully map to the start function -- set by the #:servlet-regexp optional variable. Finally, we also found the configuration to set the servlet stateless -- set by the optional variable #:stateless?.</p> <p>But this is missing two things we could really use out of a simple web service. The first is routing. We do that by looking up the documentation for the <a href="https://docs.racket-lang.org/web-server/dispatch.html">web-server/dispatch</a> module. We'll use this module to define some routes -- adding a 404 route to demonstrate the usage.</p> <div class="highlight"><pre><span></span><span class="o">#</span><span class="nv">lang</span><span class="w"> </span><span class="nv">racket/base</span> <span class="p">(</span><span class="nf">require</span><span class="w"> </span><span class="p">(</span><span class="nf">prefix-in</span><span class="w"> </span><span class="nv">dispatch:</span><span class="w"> </span><span class="nv">web-server/dispatch</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="nf">prefix-in</span><span class="w"> </span><span class="nv">xexpr:</span><span class="w"> </span><span class="nv">web-server/http/xexpr</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="nf">prefix-in</span><span class="w"> </span><span class="nv">servlet:</span><span class="w"> </span><span class="nv">web-server/servlet-env</span><span class="p">))</span> <span class="p">(</span><span class="k">define</span><span class="w"> </span><span class="p">(</span><span class="nf">not-found-route</span><span class="w"> </span><span class="nv">request</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="nf">xexpr:response/xexpr</span> <span class="w"> </span><span class="o">`</span><span class="p">(</span><span class="nf">html</span><span class="w"> </span><span class="p">(</span><span class="nf">body</span><span class="w"> </span><span class="p">(</span><span class="nf">h2</span><span class="w"> </span><span class="s">&quot;Uh-oh! Page not found.&quot;</span><span class="p">)))))</span> <span class="p">(</span><span class="k">define</span><span class="w"> </span><span class="p">(</span><span class="nf">home-route</span><span class="w"> </span><span class="nv">request</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="nf">xexpr:response/xexpr</span> <span class="w"> </span><span class="o">`</span><span class="p">(</span><span class="nf">html</span><span class="w"> </span><span class="p">(</span><span class="nf">body</span><span class="w"> </span><span class="p">(</span><span class="nf">h2</span><span class="w"> </span><span class="s">&quot;Look ma, no state!!!!!!!!!&quot;</span><span class="p">)))))</span> <span class="p">(</span><span class="k">define-values</span><span class="w"> </span><span class="p">(</span><span class="nf">route-dispatch</span><span class="w"> </span><span class="nv">route-url</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="nf">dispatch:dispatch-rules</span> <span class="w"> </span><span class="p">[(</span><span class="s">&quot;&quot;</span><span class="p">)</span><span class="w"> </span><span class="nv">home-route</span><span class="p">]</span> <span class="w"> </span><span class="p">[</span><span class="k">else</span><span class="w"> </span><span class="nv">not-found-route</span><span class="p">]))</span> <span class="p">(</span><span class="nf">servlet:serve/servlet</span> <span class="w"> </span><span class="nv">route-dispatch</span> <span class="w"> </span><span class="kd">#:servlet-path</span><span class="w"> </span><span class="s">&quot;/&quot;</span> <span class="w"> </span><span class="kd">#:servlet-regexp</span><span class="w"> </span><span class="o">#</span><span class="nv">rx</span><span class="s">&quot;&quot;</span> <span class="w"> </span><span class="kd">#:stateless?</span><span class="w"> </span><span class="no">#t</span><span class="p">)</span> </pre></div> <p>Run this version and check out the server root. Then try any other path. Looks good. The final missing piece to this simple web service is logging. Thankfully, the <a href="https://docs.racket-lang.org/web-server-internal/dispatch-log.html">web-server/dispatch-log</a> module has us covered with some request formatting functions. So we'll wrap the route-dispatch function and we'll print out the formatted request.</p> <div class="highlight"><pre><span></span><span class="o">#</span><span class="nv">lang</span><span class="w"> </span><span class="nv">racket/base</span> <span class="p">(</span><span class="nf">require</span><span class="w"> </span><span class="p">(</span><span class="nf">prefix-in</span><span class="w"> </span><span class="nv">dispatch:</span><span class="w"> </span><span class="nv">web-server/dispatch</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="nf">prefix-in</span><span class="w"> </span><span class="nv">dispatch-log:</span><span class="w"> </span><span class="nv">web-server/dispatchers/dispatch-log</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="nf">prefix-in</span><span class="w"> </span><span class="nv">xexpr:</span><span class="w"> </span><span class="nv">web-server/http/xexpr</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="nf">prefix-in</span><span class="w"> </span><span class="nv">servlet:</span><span class="w"> </span><span class="nv">web-server/servlet-env</span><span class="p">))</span> <span class="p">(</span><span class="k">define</span><span class="w"> </span><span class="p">(</span><span class="nf">not-found-route</span><span class="w"> </span><span class="nv">request</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="nf">xexpr:response/xexpr</span> <span class="w"> </span><span class="o">`</span><span class="p">(</span><span class="nf">html</span><span class="w"> </span><span class="p">(</span><span class="nf">body</span><span class="w"> </span><span class="p">(</span><span class="nf">h2</span><span class="w"> </span><span class="s">&quot;Uh-oh! Page not found.&quot;</span><span class="p">)))))</span> <span class="p">(</span><span class="k">define</span><span class="w"> </span><span class="p">(</span><span class="nf">home-route</span><span class="w"> </span><span class="nv">request</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="nf">xexpr:response/xexpr</span> <span class="w"> </span><span class="o">`</span><span class="p">(</span><span class="nf">html</span><span class="w"> </span><span class="p">(</span><span class="nf">body</span><span class="w"> </span><span class="p">(</span><span class="nf">h2</span><span class="w"> </span><span class="s">&quot;Look ma, no state!!!!!!!!!&quot;</span><span class="p">)))))</span> <span class="p">(</span><span class="k">define-values</span><span class="w"> </span><span class="p">(</span><span class="nf">route-dispatch</span><span class="w"> </span><span class="nv">route-url</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="nf">dispatch:dispatch-rules</span> <span class="w"> </span><span class="p">[(</span><span class="s">&quot;&quot;</span><span class="p">)</span><span class="w"> </span><span class="nv">home-route</span><span class="p">]</span> <span class="w"> </span><span class="p">[</span><span class="k">else</span><span class="w"> </span><span class="nv">not-found-route</span><span class="p">]))</span> <span class="p">(</span><span class="k">define</span><span class="w"> </span><span class="p">(</span><span class="nf">route-dispatch/log-middleware</span><span class="w"> </span><span class="nv">req</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="nb">display</span><span class="w"> </span><span class="p">(</span><span class="nf">dispatch-log:apache-default-format</span><span class="w"> </span><span class="nv">req</span><span class="p">))</span> <span class="w"> </span><span class="p">(</span><span class="nf">flush-output</span><span class="p">)</span> <span class="w"> </span><span class="p">(</span><span class="nf">route-dispatch</span><span class="w"> </span><span class="nv">req</span><span class="p">))</span> <span class="p">(</span><span class="nf">servlet:serve/servlet</span> <span class="w"> </span><span class="nv">route-dispatch/log-middleware</span> <span class="w"> </span><span class="kd">#:servlet-path</span><span class="w"> </span><span class="s">&quot;/&quot;</span> <span class="w"> </span><span class="kd">#:servlet-regexp</span><span class="w"> </span><span class="o">#</span><span class="nv">rx</span><span class="s">&quot;&quot;</span> <span class="w"> </span><span class="kd">#:stateless?</span><span class="w"> </span><span class="no">#t</span><span class="p">)</span> </pre></div> <p>Run this version and notice the logs displayed for each request. Now you've got a simple web service with routing and logging! I hope this gives you a taste for how easy it is to build simple web services in Racket without downloading any third-party libraries. Database drivers and HTML template libraries are also included and similarly well-documented. In the future I hope to add an example of a slightly more advanced web service.</p> <p class="note"> I have had huge difficulty discovering the source of Racket libraries. These library sources are nearly impossible to Google and search on Github is insane. Best scenario, the official racket.org docs would link directly to the source of a function when the function is documented. Of course I could just download the Racket source and start grepping... but I'm only so interested. </p><p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Walking through a basic Racket web service <a href="https://t.co/J3us48kzga">https://t.co/J3us48kzga</a> <a href="https://twitter.com/racketlang?ref_src=twsrc%5Etfw">@racketlang</a></p>&mdash; Phil Eaton (@phil_eaton) <a href="https://twitter.com/phil_eaton/status/814674473681121280?ref_src=twsrc%5Etfw">December 30, 2016</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script></p> http://notes.eatonphil.com/walking-through-a-basic-racket-web-service.htmlThu, 29 Dec 2016 00:00:00 +0000