How can we evaluate and compare the usability of ggplot2, Vega-Lite, matplotlib and friends? Here is a summary of the research I’ve been working on and which will be presented at VIS 2023.
]]>I love the way Apple names things: FireWire, App Nap, iPhone, iPod, EarPod, AirPod, AirPlay, FairPlay etc. It’s playful and there’s a lot of reuse of words and sounds. Having just finished reading the biography of Steve Jobs, I decided to visualize various Apple product and feature names as a network to see this reuse in action. Check out the interactive version here (it works best on a desktop). I got the names from Apple’s public list of trademarks.
]]>IEEE VIS 2022 was last week in Oklahoma City and I was really happy to have been able to attend in person. The very-earnest motto of the conference was “VIS ’22 is about the people you meet” and for me at least, it really was! I’ll note that my strategy of writing a blog post describing my research and interests and tweeting it ahead of time and posting it to the Discord really helped with meeting people, so I will definitely be doing that again. That said, the personal connections I made aren’t that interesting to read about, so here is my list of highlights from the conference content. If you prefer a longer recap with more photos, Tamara Munzner’s traditional epic yearly VIS twitter thread is also available.
]]>Three years ago I had a great time attending IEEE VIS 2019 as a bit of an outsider, eager to learn about what the cutting edge of data visualization research looked like. I “attended” the next two editions remotely like everyone else, and even participated in a panel at the VisInPractice workshop last year. This year, I’m attending VIS (next week!) in person in Oklahoma City as a bona fide graduate student, as I’ve decided to take a break from working in the tool-making industry and pursue a research-focused masters degree, advised by Michael McGuffin at the École de technologie supérieure here in Montreal. At VIS, I’ll be presenting a paper I co-wrote about VegaFusion, but that’s actually not the primary focus of my research. I’ve decided to write up a little mid-masters progress report, both as a personal check on my progress and to increase the likelihood that I’ll have interesting conversations at VIS next week with folks who have related interests!
]]>I gave a talk at Montreal Python where I showed a diagram I’ve been working on to capture and explain how the various pieces of the Python data visualization landscape fit together. My presentation is first, starting about 7 minutes into the video.
]]>I gave a full-length webinar (name & email required to access, sorry!) about Dashboard Engine, the product my team and I have been working on for 2 years.
]]>Another four-year cycle, another interactive Montreal municipal election map (see also the 2013 and 2017 editions).
]]>I was on a panel at PyData Global 2021 where folks representing various Python “dashboarding” frameworks compared and contrasted their work with Dash, which I represented.
]]>I gave a talk at PyData Global 2021 that pulls together some ideas about why interactive data visualization matters into what I hope is an interesting and useful framework.
]]>I participated in a panel called Tools of the Trade at the 2021 VisInPractice event, part of IEEE VIS 2021.
]]>I was very proud to introduce Dashboard Engine to the world, as part of the Dash Enterprise 5.0 announcement webinar. I’ve been the product manager and team leader for this project for 18 months and it’s really gratifying to see it come to fruition.
]]>I was proud to be able to give a talk at SciPy 2021 this year, about Plotly Express and Dash!
]]>I’ve just published a personal project I’ve been thinking about doing for a few years now: revisiting figures from a 1967 book which has had a big influence on how I (and others!) think about data visualization, Jacques Bertin’s Semiology of Graphics.
]]>I was recently interviewed on the IQT Podcast about Visualizing Data During a Pandemic, and how Plotly is contributing to COVID-19 response.
]]>I was pleased to give a three-minute rundown at SciPy 2020 about what the Plotly.py team has been up to! My bit is at timecode 8:25.
]]>I was happy to be invited back for a third time to talk about Plotly during Professor Thomas Hurtut's data visualization class at Polytechnique Montréal (in English this time!).
]]>Plotly Express is the built-in high-level data visualization interface for Plotly.py, a leading interactive data visualization library for Python. With today’s release of Plotly.py 4.8, Plotly Express now gracefully operates on wide-form and mixed-form data – not just “tidy” long-form data. These new capabilities dramatically expand Plotly Express’ promise of ‘interactive data visualization in a single Python statement’, by removing the need to wrangle your data into a particular form before plotting.
]]>I was happy to be invited back to talk about Plotly during Professor Thomas Hurtut's data visualization class at Polytechnique Montréal (in French).
]]>I recently gave a talk about Plotly Express and Dash at Montreal Python. The description of the talk was, "You start the morning exploring some data in a Jupyter notebook with Plotly Express and after lunch you whip up a web application to give your non-programmer colleagues access to those same insights with Dash, all in under a 100 lines of Python, no Javascript required. This talk will show you how Plotly's open-source libraries fit together to make this possible."
]]>Plotly Express is a new high-level Python visualization library: it’s a wrapper for Plotly.py that exposes a simple syntax for complex charts. Inspired by Seaborn and ggplot2, it was specifically designed to have a terse, consistent and easy-to-learn API: with just a single import, you can make richly interactive plots in just a single function call, including faceting, maps, animations, and trendlines. It comes with on-board datasets, color scales and themes, and just like Plotly.py, Plotly Express is totally free: with its permissive open-source MIT license, you can use it however you like (yes, even in commercial products!). Best of all, Plotly Express is fully compatible with the rest of Plotly ecosystem: use it in your Dash apps, export your figures to almost any file format using Orca, or edit them in a GUI with the JupyterLab Chart Editor!
]]>I was happy to be invited to talk about Plotly during Professor Thomas Hurtut's data visualization class at Polytechnique Montréal (in French).
]]>I gave a talk at the Data Science, Design and Technology Montreal meetup which was a lot of fun, especially when other members of the community presented the apps that they'd created with Dash!
]]>I recently did a guest talk at the Arup Montreal office regarding the differences between Software Product Organizations and Professional Services Organizations.
]]>Data visualization uses algorithms to create images from data so humans can understand and respond to that data more effectively. Artificial intelligence development is the quest for algorithms that can “understand” and respond to data the same was as a human can – or better. It might be tempting to think that the relationship between the two is that to the extent that AI development succeeds, datavis will become irrelevant. After all, will we need a speedometer to visualize how fast a car is going when it’s driving itself? Perhaps in some distant future, it might be the case that we delegate so much to AI systems that we lose the desire to understand the world for ourselves, but we are far from that dystopia today. As it stands, despite the name, AI development is still very much a human endeavour and AI developers make heavy use of data visualization, and on the other hand, AI techniques have the potential to transform how data visualization is done.
]]>One of the easiest ways to start visualizing data is to turn a table into a heatmap: every cell gets a colour, the higher the number the brighter the colour. Unfortunately, this is often a fairly unrewarding exercise, yielding graphics that look like plaid or tartan fabric. Part of the problem is that the rows and columns of a dataset often have no natural ordering, such as time, and are instead shown in alphabetical order, or else the dataset is sorted by one of the rows or columns, rather than in an order which makes patterns pop out visually. My goal in this article is to clearly demonstrate this problem and show that there exist neat solutions to this problem using a set of techniques collectively called seriation. I’ll do this by automatically reordering the rows and columns in the following noisy-looking heatmap to make the underlying pattern very clear.
]]>Many a bored long-haul flight passenger has asked themselves why the flight path on the map is curved, and if it wouldn’t be faster to just fly straight there. In fact, airlines try very hard to keep their flight paths as straight as possible. It’s just that the rectangular world maps we are accustomed to looking at project the 3-dimensional earth onto a 2-dimensional surface such that any long straight line not directly along the equator or perpendicular to it will appear curved. Making the equator special in this way makes some sense as a default way to draw maps, because of the way the earth spins on its axis, but we can just as easily choose any other straight line path for this treatment, and doing so gives us an interesting perspective on the world and on maps.
]]>I have recently read some though-provoking articles that discussed data visualization by analogy to photography. I really like this analogy, both from a process perspective – photography and data visualization – and a people perspective – photographers and data visualizers. Anyone who takes a picture with a camera is a photographer in that moment, and anyone who makes a chart, diagram or map based on data is a data visualizer while they’re doing that. Both photographers and data visualizers produce images of information emanating from their subjects, to make a point, to record, to inform, to delight. Photographers choose the lighting of their subject and framing of their shots, then use cameras to capture their image. Data visualizers choose the data they use about their subject and the mapping of data attributes to visual attributes, then use algorithms to produce graphics. Both can post-process their images to exert even finer control over their products.
]]>Pivot tables are interactive data exploration and summarization tools which have been a critical part of data analysts’ toolkits for the past 25 years, especially in spreadsheets like Excel. Five years ago I built PivotTable.js, which has since become one of the most popular Javascript pivot table implementations. I initially wrote it in CoffeeScript and packaged it up as a jQuery plugin, but the front-end world has evolved a lot since then so today I’m excited to announce a new-and-improved version of PivotTable.js for the modern web: react-pivottable.
]]>Last November 5th was Municipal Election Day in Montreal and I’m proud to say I was one of the hundreds of volunteers who got out the vote to elect Valérie Plante as Montreal’s first female mayor and the leader of Projet Montréal. However unlike most volunteers who were making phone calls, going door to door or driving electors to polling stations, I was at the campaign headquarters in front of my computer writing SQL queries and interpreting data from a real-time web dashboard I’d built the week before. In this post I’ll explain some of what I learned through this experience about get-out-the-vote (GOTV) efforts, and a bit more about the small role I played.
]]>I recently put together a simple example of Plotly.js and Crossfilter.js working together to produce a set of linked data visualizations.
]]>Many people reacted to my my interactive map of Montreal election results with requests for tables of hard numbers, and I’m happy to oblige! I grabbed the official election results from the Montreal open data portal and aggregated them by district to produce an easy-to-use CSV file. I also created a page that preloads a PivotTable.js instance with the data, for interactive data exploration fun!
]]>The 2017 edition of my interactive map of Montreal election results is now available, and I’m so pleased about the results it shows! In 2013 I made a map a couple of months after the election and it was considered so unusual it was talked about on the radio. But times have changed: this time the data was available within days, and within hours of that, news outlets had similar maps on their websites. I still like mine better though because it shows data from all 103 races, rather than just the mayoralty. The 2013 map is still around, for reference.
]]>My friend Mark Weiss recently started a podcast called Using Reflection and I was pleased to be interviewed as a guest on his 6th episode. We had a great chat about datavis and engineering ethics, among other topics.
]]>In an agile software development project, the role of the product owner comes with the responsibily of managing the product backlog. Most popular definitions of the backlog are quite broad, encouraging product owners to include in it every feature request, bugfix, idea related to the product etc. I have found it more helpful, however, to distinguish between backlog items on the one hand (i.e. changes that as a product owner I intend to be made to the product) and feedback items on the other (i.e. bug reports, feature requests, ideas etc.)
]]>I recently did a guest lecture (in French!) at the Université de Montréal in the context of the École d’été en Architecture de l’information (Summer program for Information Architecture).
]]>As part of my second collaboration with data journalist Roberto Rocha, I made an interactive map for his recent piece on where and when Car2Go vehicles park in Montreal (shorter english version). Earlier in the year, Roberto told me about people in certain neighbourhoods complaining about Car2Go vehicles causing parking problems. He and I hit upon the idea of querying Car2Go’s API every few minutes to find out where all their available cars were parked in Montreal, to take a look at some real data on this issue. I’m a huge fan and user of car-sharing services and in my neighbourhood of Rosemont I feel they prevent parking problems by enabling lower car ownership. As my map makes clear, however, this is not the case in areas like the Mile End. In any case, the CBC articles do a great job of reporting on the situation, and I wanted to share some of the thinking and code that went into making the map.
]]>After my photographic metro platform maps went viral last week, I received a lot of feedback in the form of emails and comments, telling me about the experiences of subway riders in other cities. Here are some interesting vignettes.
]]>The photo above (click here for a zoomable version) is a collage of panoramic scans of the Angrignon-bound platforms of the Montreal metro’s green line. I used my phone to record videos from the rear-most window of the train and wrote a bit of software to stitch the frames together. My goal was to create a way to figure out where to stand while waiting for the metro so as to get out closest to where you want to go at your destination, and I used these scans to build a little interactive comparison page for just this purpose.
]]>Visualizing datasets as circle-and-arrow networks or graphs is a popular and easy way to make attention-grabbing graphics. As the number of data points grows, however, these graphics become crowded and marginally useful. Dimensionality-reduction algorithms such as t-SNE represent a different approach to visualizing the relationships between large numbers of data points, which in certain cases can produce graphics which do not suffer from the same types of problems as graph-visualization approaches. In this talk I compare and contrast the two approaches and give pointers to those who wish to try them out.
]]>By using machine learning algorithms, we are increasingly able to use computers to perform intellectual tasks at a level approaching that of humans. Given that computers cost less than employees, many people are afraid that humans will therefore necessarily lose their jobs to computers. Contrary to this belief, in this article I show that even when a computer can perform a task more economically than a human, careful analysis suggests that humans and computers working together can sometimes yield even better business outcomes than simply replacing one with the other.
Specifically, I show how a classifier with a reject option can increase worker productivity for certain types of tasks, and I show how to construct and tune such a classifier from a simple scoring function by using two thresholds. I begin with a parable featuring the same characters as the one from Part 1 of this Machine Learning Meets Economics series. I recommend reading Part 1 first, as it sets up much of the terminology I use here.
]]>I presented MLDB today at the BigData Innovators Gathering (BIG) 2016 conference.
]]>I was recently invited to give a talk about auction theory and online advertising at Concordia University for a course entitled Social and Information Networks, which uses a really interesting textbook called Networks, Crowds, and Markets.
]]>I was recently invited to give a talk at HTML5mtl, and I chose to speak about my experiences with open-sourcing PivotTable.js.
]]>The business world is full of streams of items that need to be filtered or evaluated: parts on an assembly line, resumés in an application pile, emails in a delivery queue, transactions awaiting processing. Machine learning techniques are increasingly being used to make such processes more efficient: image processing to flag bad parts, text analysis to surface good candidates, spam filtering to sort email, fraud detection to lower transaction costs etc.
In this article, I show how you can take business factors into account when using machine learning to solve these kinds of problems with binary classifiers. Specifically, I show how the concept of expected utility from the field of economics maps onto the Receiver Operating Characteristic (ROC) space often used by machine learning practitioners to compare and evaluate models for binary classification. I begin with a parable illustrating the dangers of not taking such factors into account. This concrete story is followed by a more formal mathematical look at the use of indifference curves in ROC space to avoid this kind of problem and guide model development. I wrap up with some recommendations for successfully using binary classifiers to solve business problems.
]]>I was excited to be invited to give a talk at the JavaScript Open Day Montreal about data visualization in JavaScript.
]]>I recently gave a talk at the Montreal R User Group about my favourite data visualization library, ggplot2, as well as rpivotTable, the R interface to my own PivotTable.js
As you can see in the video above, during the talk I just scrolled through an R file in RStudio. What you see below is the result of slightly modifying that file and running it through the RMarkdown process to capture the output.
]]>I went back to my alma mater at the University of Toronto to give a talk at PyCon Canada on how to make Jupyter even more magical than it already is with cell magic extensions.
]]>I was happy to oblige when I was invited to give a talk at Big Data Montreal about the project I work on at Datacratic: the Machine Learning Database (MLDB).
]]>PivotTable.js is a Javascript Pivot Table and Pivot Chart library with drag’n’drop interactivity, and it can now be used with Jupyter/IPython Notebook via the pivottablejs
module. This has been possible for RStudio users for a while now via rPivotTable, but why should they have all the fun?
For the latest in my series of maps of the results of the 2013 Montreal municipal election, I’ve produced a pair of graduated symbol maps, representing the results as a pie charts overlaid on a base map. It’s interesting to compare this type of visualization to my previous efforts: the dot map, the choropleth, and the ternary plot.
]]>I had the pleasure of visiting with many members of my wife’s family this summer, some of whom are genealogy enthusiasts. I made a pair of visualizations of the data they had collected: one in the run-up to a family reunion and one to find my way around the large family we visited in Saskatchewan.
]]>I learned to program in the late nineties through a game called RoboWar. RoboWar provides a simulated arena in which two virtual robots try to destroy each other by running a program written by their respective players. The goal was to create a robot which could win a tournament, which were held about twice a year; entrants would email their creation to someone with a fast computer who would simulate hundreds of battles and then let everyone know who had won and make the entries public.
]]>Earlier this year, I collaborated with a reporter from the Montreal Gazette to analyze a dataset containing information about 1.4 million service requests received by the City of Montreal from its citizens. The resulting article was entitled "Montreal's 311 records shed light on residents' concerns — to a point" and credits me at the bottom. I have also published my own interactive analysis of the dataset here: Montreal 311 Service Requests, an Analysis. The dataset, obtained from the city's Gestion des demandes clients (GDC) system via an Access to Information request, covered the five years from 2008 to 2012 and contained the date and a very short description for each request, and in most cases, an address. The service requests were received by the city through its 311 phone line or at service counters throughout the city.
]]>I recently did the first ever public demo of the product I'm working on at Datacratic: the Machine Learning Database.
]]>I’ve always been curious to see what kinds of patterns would be visible if one tried to visualize the distribution of house numbers (the number in a street address) across a city like Montreal. This week I took some time to learn enough about the OpenStreetMap system to gather and plot the data.
]]>I recently organized and MC’ed the fifth Visualization Montreal meetup, and I think it was a great success! The concept was to have a series of 7-minutes-max flash presentations from Montrealers where each one would show off a single visualization project. The rules were: no slides, no tools, just one publicly available data visualization. We had 12 presenters including me, with a good mix of types of data and visualizations. Below is the list of visualizations that were presented, and you can find photos of the event here.
]]>The visualization I presented at VisMtl 5 was entitled “Canadian Members of Parliament in 2012 by Province, Party, Age & Gender” and is shown above.
]]>A recent article on AdExchanger asks “In the supposedly super-efficient world of RTB, why would publishers continue to waterfall their demand sources?”. The article goes on to say that the publisher’s justification is “Because it works” but that “Any economist could tell you that this is a bad idea”. I’m not an economist but I can still pull together enough auction theory to show that this practice isn’t necessarily a bad one today.
]]>I gave a talk at in Barcelona at the PAPIs.io 2014 Predictive APIs conference last November.
]]>For part of my presentation at Montreal Python, I made an interactive map of the various sub-sections of the website Reddit (called subreddits). You can take a look at the interactive version or see a static annotated one above. The interactive one includes basic info on how I made it and full details are in the presentation. I got some nice comments in the /r/DataIsBeautiful subreddit post
]]>I gave a talk at Montreal Python on Data Science and Unsupervised Machine Learning with scikit-learn. The video is above and I posted all of my presentation materials online.
]]>Last week, Bloomberg came out with an article on RTB arbitrage, which included a couple of sentences that made it sound a lot like it was possible to front-run an RTB auction: “Some buy from an exchange and sell it right back to that very same exchange” and “Some agencies are poorly connected to exchanges and can’t respond to a first auction in time, allowing middlemen to buy and flip within the same market”. This seemed surprising to me at first, given that all auction participants (as far as I know) get the same opportunity to bid on an impression, so how could you make money buying and selling the same impression on the same exchange? Upon further thought, however, here’s a theory about how it might work.
]]>I gave a talk at Visualization Montréal entitled Maps, Tools, Stories. Check out the synced slides and video!
]]>Data visualization, by definition, involves making a two- or three-dimensional picture of data, so when the data being visualized inherently has many more dimensions than two or three, a big component of data visualization is dimensionality reduction. Dimensionality reduction is also often the first step in a big-data machine-learning pipeline, because most machine-learning algorithms suffer from the Curse of Dimensionality: more dimensions in the input means you need exponentially more training data to create a good model. Datacratic’s products operate on billions of data points (big data) in tens of thousands of dimensions (big problem), and in this post, we show off a proof of concept for interactively visualizing this kind of data in a browser, in 3D (of course, the images on the screen are two-dimensional but we use interactivity, motion and perspective to evoke a third dimension).
]]>Video and slides from my talk at the kickoff of Big Data Week Montreal 2014.
]]>If you’ve ever been browsing the web and been annoyed by those One Weird Trick ads, or by ads for that product you looked at online last month and then bought offline, you’ve probably given a thought to blocking ads altogether. The response to this idea, from people who run websites for a living, ranges from “it’s unethical” to “it’s stealing!”. According to them, the reason you get to use a website without paying for it yourself is that in exchange you see ads and website owners gets paid by the advertisers. That’s a polite summary of the great Ad-Blocking Debate, which has been going on since the early days of the commercial web. I’m not going to take sides here; rather I’ll propose a compromise enabled by a recent development in online advertising technology. I’m going to describe a “weird trick,” if you will: how to use the same system as those ads that follow you around to block ads, all the while ensuring that the websites you frequent have nothing to complain about.
]]>Recently I made some maps of the 2013 Montreal municipal elections, showing voting results down to the ballot-box level, using data from the Montreal Open Data Portal. It turns out, however, that not all of the ballot boxes in that data set are associated with a small geographical area like the ones shown in my by-ballot-box map, and furthermore, those ballot boxes have very different numbering schemes than the ones that do match up with small block-sized areas, numbers like 901 and 601 and 001A instead of small numbers from 1 to 100ish, like the others.
So what gives? These results appear to be from the early-voting polls, which, given that there are fewer of them, cover a larger area per ballot box. In this post I take a look at how leaving this data out of my maps skews the results I present.
]]>The Montreal municipal elections were just over two months ago but I played with the election results dataset over the holidays anyways as an excuse to play with a type of data I don’t normally have much to do with: geographical data. Without further ado, here is the map I made, and this post explains a bit about the process.
]]>In the Montreal mayoral election last November, nearly 85% of the vote went to one of the top three candidates. A pie chart is a simple way to show the breakdown of votes between candidates for the whole election, say, but what if you wanted to look at the vote breakdown for each of the 52 electoral districts? 52 pie charts is kind of hard to look at and discern any sort of pattern. It turns out that if you only want to look at the top three candidates, you can use a ternary plot to good effect, like I did in the image above. There’s an interactive version as well which helps make the link between the ternary plot and the map via mouse-overs.
]]>I was inspired by some cool "dot map" visualization projects around the internet (North American Census Dotmap, Toronto Visible Minorities Dot Map) to create a similar visualization of the results for the recent Montreal municipal election. I leveraged data from the Montreal Open Data portal to create the map above. There are coloured dots for (almost) each vote for the mayoralty for the top three candidates, randomly located within the catchment area for the polling booth it came from. What I like about this map is that it shows the results in all their messiness rather than neatly colour-coding entire neighbourhoods like a choropleth map would. People live and vote in arbitrary-looking clusters, not in neat blocks!
]]>There was a municipal election here in Montreal on November 3, and I had the opportunity to help build an election results dashboard to be projected on the big screen at the election-night party for the political party I support: Projet Montréal. The dashboard is still up with final results. I worked with Nicolas Marchildon, who had put together a similar system for the 2009 election.
]]>Between 2011 and 2013 I wrote a popular 5-part series of articles about Datacratic's real-time bidding algorithms, and I've collected them together here for easier reading.
]]>Slides from a talk I gave at JS-Montreal about PivotTable.js
]]>When I wear my 'data scientist hat', one of the tools I reach for most often is a pivot table. When I wanted to build a web-based tool that included a pivot table, I didn't find any Javascript implementations that made sense or didn't have crazy assumptions built-in, so I rolled my own in CoffeeScript, as a jQuery plugin.
It's now up on GitHub under an MIT license with some nice examples. I hope people find it useful!
If you work with data and you don't know what a pivot table is, I encourage you to learn about them, because they are very useful for quick'n'dirty data analysis. My web-based implementation is a decent learning tool but there are other, much-better implementations, such as in Microsoft Excel (although since Office 2003 they've made some changes that were not for the better) and AquaDataStudio.
I posted this on Hacker News and got some nice comments!
]]>In 2005, I was contracted to create a program to support research into the application of a statistical technique called Kernel Density Estimation to the study of global poverty. The result of this contract (which I worked on with my friend and occasional colleague David de Koning) is the Kernel Density Estimation and Analysis tool which I have just released on Github under an open-source license.
The research (which, to be clear, wasn't done by me) resulted in a very interesting paper called Kernel Density Estimation Based on Grouped Data: The Case of Poverty Assessment.
]]>In 2003, I wrote a neat and powerful piece of software called Galapagos for my 4th-year undergraduate thesis (download PDF). It was a framework for the development of advanced (i.e. distributed, parallel and/or hybrid) evolutionary algorithms, applicable to a wide range of computational challenging optimization problems. I applied it to a variety of transportation-related problems at the University of Toronto.
]]>I was invited to speak on a panel at a Rubicon Project product launch, and this is the video of the event.
]]>There’s an organization in Montreal I think is awesome called Santropol Roulant which, among other things, has a meals-on-wheels operation. They have hundreds of volunteers and wanted to upgrade the system they used to store their volunteer information, so I helped them out, and I’ve open-sourced the results, in case any other non-profit wants a very simple volunteer-list management system.
]]>This is the machine-readable back of my new nerdy QR-code business card!
]]>There doesn't appear to be a good Wikipedia entry for RTB for me to link at the moment, when I want to blog about it so I'll draft my own explanation here. (Edit: there is an entry now, but I like my characterization better!) Keep in mind while reading this that I'm looking at RTB as a software engineer with an interest in economics, rather than as an ad industry veteran!
]]>One of the things we do at Datacratic is to use machine learning algorithms to optimize real-time bidding (RTB) policies for online display advertising. This means we train software models to predict, for example, the cost and the value of showing a given ad impression, and we then incorporate these prediction models into systems which make informed bidding decisions on behalf of our clients to show their ads to their potential customers.
]]>At Datacratic, working with data often means data visualization (or dataviz): making pretty pictures with data. This is usually more like making fully machine-generated images than carefully laying out "infographics" of the Information Is Beautiful school but I find they usually end up looking pretty good. There are lots of good tools for graphing data, like matplotlib or R or just plain old Excel-clone spreadsheets but what we use most often is Protovis, the Javascript library for generating SVG, coupled with CoffeeScript, which is a concise and expressive language that compiles down to Javascript.
]]>This is a screenshot of what I pull up on my iPhone every morning now after its alarm clock wakes me up. That's right, it's an interface to turn on my espresso machine so that it will warm up to a specific temperature by the time I'm done snoozing! I can even look at a real-time plot of the temperature to confirm that it's holding where it should be and doesn't need to bumped up or down a degree.
]]>At Datacratic we tend to worship, like Etsy (and AppNexus!), at the Church of Graphs. We've even started using Statsd, the system they've released to collect stats and relay them to Carbon for display in Graphite. And by display, I mean display on a dashboard visible to the entire dev team at the office, as seen above! Statsd is a very simple system to which you can send UDP messages about various stats you want to track, which it then aggregates and passes along to Carbon, which stores them in Whisper, Graphite's back-end data store. That's a lot of moving parts but it works very well. Sending stats to statsd is extremely easy from any language (we do it from Javascript and C++) and carries low overhead, which is key for the type of work we do.
]]>So I finally got around to working on my Silvia Mod Plan, getting all the way to Step 5! The video above is a demo of the setup I have to show a real-time graph on my iPad of the boiler temperature in the Silvia.
Having installed the thermocouple in the Silvia and played with my TC4 shield, my initial plan was to use the Arduino to transmit data to my iMac using XBee as a wireless serial link, where I would run a NodeJS process which would read data from the USB port and which would communicate with the iPad via a WebSocket over Wifi (phew, mouthful!). Ideally the Arduino would speak Wifi but in the meantime I figured I'd play with this setup. I chose NodeJS because it seemed really easy to set up WebSockets using socket.io, and that seemed like a good way to feed data to Smoothie Charts for real-time graphing. I rewrote the code in CoffeeScript, because it's the best way to write NodeJS code IMO (a discovery I made after writing the first version of this code 4 months ago) and because it's so fitting for this project!
]]>So in October I finally bought the Rancilio Silvia I'd been coveting (after coveting a La Pavoni manual espresso machine first!) with the intention of modding it to add digital temperature control to it, as many have done. This basically involves replacing the stock thermostat with a thermocouple and solid-state relay, plugged into a digital control unit which does PID control. Now being a software guy who likes to tinker with his Arduino, I pretty quickly decided that I wouldn't use a standard industrial controller but I would build a custom one (kind of like this one, but read on for differences). In addition, I decided I didn't want to hang anything from the Silvia or make any externally-visible modifications, partly for esthetic reasons, but mostly because I'm not too good at making good-looking hardware myself. So I decided I would build something I could control wirelessly, from my iPhone for example. "Control" in this case basically means occasionally changing the set point of the controller, and maybe turning it on remotely. One other feature which would be nice would be a timed on/off feature, like "on 30min before I wake up" and "off after 1h of disuse" or something like that.
]]>On Saturday I attended hackMTL, a one-day hackfest/competition. The ground rules called for creating an app using at least one of a list of API's. The one that caught my eye was the DokDok API (now Context.io), which basically gives you programatic read access to your GMail inbox via HTTP/JSON. Since June or so I've been doing more and more visualization of data that I work with (first at Bell then at Recoset) so I figured I'd see if I could make an app that could make a neat picture of my social network, as it's represented in my inbox. I didn't quite finish an "app" per se during hackMTL but I did manage to make a pretty picture (above). The code is up on GitHub, and basically it's a Python script that creates a JSON file which is rendered using Protovis. The circles/graph nodes represent email addresses (aka people) and the links between nodes indicate that the two parties were on the same GMail thread.
]]>After building my little autonomous wanderer robot, I saw this iPad-controlled blimp and went ahead and added iPad control to my robot too. I had a Python script on my iMac which took commands from the iPad over Wifi and relayed them to the Arduino using XBee. I demo'ed this at BitNorth 2010, but I'm not sure if the video is up yet. That said, the fact that all the commands were relayed through a 'base station' bothered me, and I saw a cheap wireless Wii Nunchuk which I immediately bought, thinking I could make it into a really ergonomic remote control for my little robot. I probably could have gotten a Wifi shield instead, but I was up for a different kind of challenge.
]]>Back in April I got an Arduino Duemilanove and a variety of sensors and actuators, which promptly caused me to go order a rover base from RobotShop.ca, and a few days later, I had put this little guy together! Truly, Arduino is an easy-to-use platform for this sort of thing. I went on to put together an iPad-controlled version as well as a wireless Wii-Chuck-controlled version but I'll post about those later.
]]>From October 2005 to May 2010, myEWB.ca was Engineers Without Borders Canada's official online community system. During that time, tens of thousands of people used myEWB to participate in thousands of online conversations, send millions of emails, schedule hundreds of events, share files, apply to serve overseas, register for conferences, pay their membership dues and otherwise collaboratively work towards eradicating poverty. As the team leader and major contributor to the myEWB project, I am proud to be able to host a small page for OpenMyEWB: the now-open-sourced software that powered myEWB.ca.
]]>I always have trouble remembering what famous people were alive when, and more importantly, who was alive during, before or after who’s life. It’s easy enough to remember which philosopher came before which other philosopher or which scientist came after which scientist, but often harder to remember which scientist came before which philosopher etc. So I built a PHP script to automatically lay out an HTML timeline to help keep all this stuff in context. I then decided to learn how to use XSL and rebuilt my little experiment using that. Click here with a modern browser to open the timeline.
]]>I did a 'short bit' presentation at Bitnorth 2009 about my feelings on myEWB and its history.
The presentation is available as PDF, PowerPoint and MP3.
]]>You're part of a small-to-medium group of people who trust each other and have some reason to spend time and money together (e.g. you have a few housemates or you work in a friendly office environment). It's a huge hassle to constantly be requesting separate bills or lending each other money and losing track of who owes who how much and for what. You waste time quibbling over small amounts and people in your group don't like lending each other money or picking up the tab for lunch.
Using this simple web application, a group of people who trust each other can keep a running tab of how much they owe each other. This is handy for figuring out such things as who should pay for groceries next time (e.g. whoever owes the most) and generally makes people more easy-going about spotting each other money or picking up lunch, as they know they can keep track of it. With PHPTab, any member of your group can be as lenient or penny-counting as they want, without ever having to actually be constantly be trading small sums of money. It's just a matter of "putting it on their tab".
]]>In the fall of 2003, my classmates and I were given an assignment in our Bridge Design course, to model the Salginatobel Bridge as a truss and analyse it using the Stiffness Method in Excel (ignoring buckling effects). As an additional challenge, we were to be awarded bonus marks if we could improve upon the efficiency of this, our professor's favourite bridge, by moving the members around.
]]>