Personal Data Warehouses: Reclaiming Your Data
I like the way that Simon is liberating his data from silos and making it work for him.
I like the way that Simon is liberating his data from silos and making it work for him.
The title says it all, really. This is another great piece of writing from Paul Ford.
I’ve noticed that when software lets nonprogrammers do programmer things, it makes the programmers nervous. Suddenly they stop smiling indulgently and start talking about what “real programming” is. This has been the history of the World Wide Web, for example. Go ahead and tweet “HTML is real programming,” and watch programmers show up in your mentions to go, “As if.” Except when you write a web page in HTML, you are creating a data model that will be interpreted by the browser. This is what programming is.
This strikes me as a sensible way of thinking about machine learning: it’s like when we got relational databases—suddenly we could do more, quicker, and easier …but it doesn’t require us to treat the technology like it’s magic.
An important parallel here is that though relational databases had economy of scale effects, there were limited network or ‘winner takes all’ effects. The database being used by company A doesn’t get better if company B buys the same database software from the same vendor: Safeway’s database doesn’t get better if Caterpillar buys the same one. Much the same actually applies to machine learning: machine learning is all about data, but data is highly specific to particular applications. More handwriting data will make a handwriting recognizer better, and more gas turbine data will make a system that predicts failures in gas turbines better, but the one doesn’t help with the other. Data isn’t fungible.
A step-by-step explanation by Henrik on how he implemented fuzzy search on his music site—something I could do on The Session. He even talks about expanding this to work with ABC notation.