SemVer or Semantic Versioning is an important tool to help us build large modular systems. Most modern programming languages have a package manager of some description to go with them. This lets you install dependencies easilly, and it allows those dependencies to have their own dependencies and so on. The advantage of this is that modules can share code, which saves time and speeds up bug fixing.
For the Library Consumer
OK, so lets imagine your building a large web-app and you’ve got lots of dependencies because you know that’s better than re-writing or copy and paste programming. You have broadly 3 options for how you lock down your dependency versions:
- Use
*
to always get the latest - Use an exact version number to always stay on the same version
- Use some kind of range specifier.
Option 1 is bad news. If you use option 1 then every time someone changes their module, yours might break. It means you automatically always get the latest version and you don’t keep any information about what version your app was actually tested on.
Option 2 has two issues. Firstly there’s a false sense of security. You may have used exact specifiers for your dependencies, but the dependencies of your dependencies probably haven’t, so this doesn’t really fully protect you. The other problem is that you won’t automatically get bug fixes. You’ll only get bug fixes if you explicitly upgrade.
Option 3 is usually the best option. If you write your version specifier as ~1.0.0
you will get the latest version that’s been released and begins 1.0
. This means that if the versions that have been released were:
You would get version 1.0.5
. The key to this semantic versioning is that the first number represents large or breaking changes, the second version represents new features that might cause minor backwards incompatabilities, but probably wont. The third number represents bug fixes. As such, this specifier leads to you getting all bug fixes automatically, but forces you to manually update if you want new features.
For the Library Author
As a library author the rules are simple:
- Increment the third number for bug fixes. Never introduce any breaking changes without incrementing another number.
- Increment the second number for new features that don’t change the API. This should mean that 99% of users won’t see any breaking changes, but if someone’s relied on an odd corner case, or done something a little wierd, it might break.
- Increment the first number every time there’s a major breaking change. If you expect most people will need to do a little bit of manual work to upgrade, then increment this number.
What could go wrong?
You might ask yourself, “Does it really matter that much?”. The answer is that it does. If you break things without incrementing the correct version number, then other software that relies on semantic versioning will break. If you frequently increment too high a number, your bug fixes will be slow to propagate and people will run into errors that have been fixed for ages.
Safety First
There’s one problem I keep seeing from people who normally follow SemVer. What happens when you realise your back room project has become hugely popular and that your hastily concoted API just isn’t cutting it anymore? You decide it’s time for a total redesign. You’re going to create a totally new version and everyone’s code will break unless they manually change things to be compatible with the new API.
When you’re faced with doing this, you have to be prepared. You will get users who get stung by the change. There will be people who used *
as their version specifier when they depended on your library. These people have never learned about SemVer so they just feel cheated that you broke their module. Expect to have to patiently explain SemVer to a few people in a couple of GitHub issues. Perhaps link to this, or another, blog post.
When faced with this prospect, I see lots of library authors going “why not release it under a new name?”. This is what @mishoo decided to do with uglify-js. Instead of publishing it as [email protected]
he published it as uglify-js2. The long term problem with starting out this way, is that there are still a number of modules that depend on the uglify-js2
version, even though he did ultimately switch back to publishing as [email protected]
. You might argue that this was a problem that could have been avoided if he just stuck to his original plan of calling it uglify-js2
.
There are 2 key problems with this:
- Discoverability
- The future’s a whole lot longer than the present.
Discoverability is a problem because all the people using your old version may never find out about the new version. Tools like gemnasium and david won’t report the version as out of date, because it won’t know about the new version. This means people will miss out on awesome features. Perhaps worse though are the newcommers. They may find the old version and stop there. For example, a google search for uglifyjs displays the original github repository for uglify-js
before the new one for uglify-js2
. This is because @mishoo still hasn’t fixed the mistake of creating a whole new GitHub repo for the new version (rather than making use of branching). Many newcommers would just see that first result and use it. Even the ones who see the second result, might see that the first is more popular and assume that the second is some botched attempt at creating a better copy of UglifyJS and just use what appears to be the original, more popular, official version.
The second problem is that I envisage uglify-js being one of the most popular JavaScript minifiers for at least another couple of years. It’s fully possible that if we keep using JavaScript, and uglify-js stays up to date, it could end up being the most popular minifier for another 10 or 20 years. That’s a long time for us to live with adding a 2
to the end of the name. It might not seem like much, but everyone adding a 2
to the end forever more is a definite pain, and can definitely be avoided. The small pain of breaking a few apps/code briefly now vs. slowing down software development forever is, in my opinion, not a tough choice.
Teaching
It’s important to use SemVer, even if those around you sometimes get it wrong. If we don’t, then people won’t learn. If you’re writing a library, and you don’t let people get stung because they didn’t follow SemVer, then those people will never learn the valuable lesson about how to specify version numbers. Most of the people who get stung probably won’t be big professional developers, they’ll be people just starting to learn, and it’s better that they learn now than later.
If your consuming a library, you should use SemVer, and trust your unit tests (make sure you always write some). This way you’ll find out, and can teach the library authors about SemVer when they publish versions with breaking changes.
Every rule is made to be broken
I break this rule myself, for two reasons.
- If I know a version is broken, I may use absolute version numbers to avoid the new version until it’s fixed. If I do this, I’ll always open an issue on the library author’s GitHub repo, or send them an e-mail if I can’t find the GitHub repo.
- For my development dependencies I often just use
*
. This rarely results in breaking changes; It’s always me, or another library author, who discovers them, not an end user; and it’s a pain to always have to worry about updating them manually.
Other than this, if you see a library of mine that doesn’t use SemVer, it’s because I wrote it before I knew about SemVer.