Yesterday was the 25th birthday of the Free Software Foundation (FSF).
Over the past 25 years they’ve managed to deeply influence the world of computing and technology. The ideas promoted by the FSF’s foundation and leader, Richard Stallman, have certainly veered on the radical side. However, even if you don’t agree with this group’s ideology, I’d argue that we are afforded a lot more software freedom thanks to their activism.
Free and Open Source software (FOSS) has made our world a better place, and even if we put certain ideologies aside, software that is expandable, adaptable and often (but not necessarily) available for free is certainly a good thing for the end user.
Yet the business world has only partially embraced FOSS, and it’s rare to see an organization that exclusively runs its operations using only Open Source software and web applications that have been released under OSI approved licenses.
There are many reasons why this is the case, but the fundamental one is that software is only a means to an end for most businesses. They’ll use whatever they feel works best, is cheapest, best supported, and that they’re most familiar with. After all, they’re in business to make money, so religiously adhering to software philosophies is not usually at the top of their priority list.
Open Source business models
You’ll notice that for the most part, companies who develop FOSS don’t make their money from that FOSS itself. While they could theoretically sell it, the existence of clauses that allow their customers to redistribute the software for free makes it a less than ideal business model.
As such, better business models had to be invented to workaround the practical (not theoretical) limitations of making money with FOSS. The three most common of which are: dual licenses, Open Core, and support-based (and combinations thereof).
(Note that I’m mostly talking about desktop software here. Open source web apps tend to capitalize on the hosted nature of the service and are easier to monetize even when the software itself becomes freely available.)
Dual license: A commercial license is often used in conjunction with viral Open Source licenses like the GPL. A company setting up a dual license of this kind essentially relies on other companies’ inability to accept the viral license (because they don’t want to give their code away just because they used it in conjunction with your GPL code).
Open Core: The core product is Open Source, but if you want extra features, that are often indispensable to businesses, you’ll have to pay for them (as they’re commercial and proprietary). A company adopting this model is essentially using a freemium model, in which they use Open Source as a gateway to selling you proprietary software (which is the real way they make their money).
Support-based: Finally, there is the support option. Development companies went from basically having a license to print money to a much less scalable business model where they generate revenue based on the support they provide. Incidentally, while support is not the main “vocation” of most people who start software companies, it can be a lucrative avenue nevertheless.
Open Source as a nice to have
My point is that in all three cases, businesses may appreciate the Open Source nature, but it’s not a key factor in their decision making. In fact, in the first case they are buying a commercial license to escape the contractual limitations of an Open Source license; in the second case, they are paying outright for proprietary software; and in the third case, they are paying for the assurance that the software is maintained and that there’s going to be someone who’s accountable and ready to assist if problems arise (this is a real problem with many less popular community-driven Open Source projects).
In short, as long as the software fits the bill, companies will not take an idealistic view when it comes to software, but rather focus on what affects their bottom-line. It’s fairly safe to say that most companies really care about having software that is reliable, saves them time (their most precious resource), and that’s well supported. Open Source is a “nice to have” for most of them.
Why would you use commercial databases?
As an advocate for the DB2 team I often hear people ask, “Why would you use DB2 when MySQL is free and Open Source?”. The truth is that countless companies use DB2 because they know it ends up being cheaper in terms of both time and cost, when compared to Open Source database solutions.
I don’t doubt that many developers who work with Open Source technologies may care about their database being free (as in beer), free (as in speech), and its ability to meet a subset of functionalities available in pretty much any relational databases. In that case, MySQL is fine. But the needs of companies, both large and small, will often differ from that of your typical solo hacker.
IBM has an extensive number of DB2 customers because these companies care about features that don’t exist or are very rough around the edges in many Open Source databases.
Most of our customers care about:
- Proven reliability and advanced security features;
- Utmost performance;
- Ability to natively handle XML documents and data (some industries have to do so as a result of regulations);
- Vertical and horizontal scalability as their business grow;
- Advanced and easySQL Replication and High Availability and Disaster Recovery (HADR) tools;
- Backup compression to save storage costs (it’s not an issue if you have a web app that stores a grand total of 50MB of data, but it’s fundamental when you have 50TB);
- 24/7 support from a reputable company that will truly stand behind the product they sell;
- Ease of administration (DB2 offers autonomic features that pretty much self-adjust DB2 automatically based on the workload and usage it’s handed);
- Overall TCO (which involves a lot more than just looking at the price tag for the software).
DB2 rocks all of these very crucial points. Furthermore, with the DB2 Express-C version, you get to use a production ready version absolutely for free.
Don’t let the name scare you. Oracle has really given the “Express” brand a bad vibe. Oracle XE is full of security holes, hasn’t been updated in 5 years, and is a severely crippled edition. Conversely, DB2 Express-C is regularly updated and impose very little limitations on you.
True, even DB2 Express-C has some limitations when it comes to the kinds of resources that it can use (up to 2GB of RAM and 2 CPU cores). Fear not though, it’s still plenty useful for many startups or small businesses (it’s used in production by many such folks). With DB2 Express-C there are no limits on the number of databases, their sizes, the number of users, connections, etc. It’s the perfect starter edition. The presence of three commercial editions (Express-C FTL or Express, Workgroup, and Advanced Enterprise) guarantees you that as your business flourishes, DB2 will grow with you.
How far can you go with top of the line hardware and the commercial editions? Let’s put it this way: every time you buy something with your VISA, that transaction passes through DB2. Furthermore, DB2 holds the world record for TPC-C benchmarks, with over 10,000,000 transactions per minute.
Let’s make DB2 available to more startups
Today IBM is announcing new pricing to extend the benefits of our support and commercial edition to a greater number of startups and small businesses. You can get the commercial Fixed Term License for about $1,500 per server, per year. (I pay more per year to host this blog.) Aside from 24/7 support, regular fix pack updates, and the clustering option for SQL Replication and HADR, this DB2 Express-C license will also allow you to use up to 4GB of RAM and 4 CPU cores. (There is an Express edition if you’d prefer to pay a one time fee, rather than a yearly one, or if you’d like to pay through other metrics such as per user.)
Commercial databases, particularly innovative ones like DB2, have their place in our industry both today and in the future, even in the face of the indisputable popularity of Open Source options. Businesses should evaluate proprietary and Open Source solutions on technical merits rather than ideological ones. In my opinion, there is space for both of these models to peacefully coexist to better serve the diverse needs of the business world.
DISCLAIMER: These are my controversial opinions on the topic at hand. They do not represent the opinions of IBM or anyone else associated with IBM. In fact, I wrote this post at 2 AM on my own laptop and published it on my personal blog. No executives were consulted (or harmed) in the making of this post. 🙂
Get more stuff like this
Subscribe to my mailing list to receive similar updates about programming.
Thank you for subscribing. Please check your email to confirm your subscription.
Something went wrong.
Thanks for the post. However I found the list of features to be generic and not very specific.
My wife asked if MySQL is good enough for facebook and twitter and then why would you buy Oracle or DB2.
I have been looking for a detailed and convincing answer to that question for a while now. A blog to answer this question would be highly appreciated.
Nadal, the decision to go with the pricey, proprietary database is most often based upon a corporate employee’s need to have someone else to blame if something goes wrong. Also, the sales person who sold the product will make inflated claims as to the availability of support when difficulties arise.
Your wife is a wise woman. ;^)
Hi Nadal, MySQL is great for handling web traffic. Facebook and Twitter generates boatloads of traffic and requests, but most companies are not Facebook and Twitter and so they have a very different set of needs out of their database systems. To put this another way, I would be highly doubtful that Facebook runs their financial systems through MySQL.
Generally, companies require the very highest of standards when it comes to the availability and integrity of their business data, and MySQL just isn’t there. They will often run applications that require advanced database capabilities that MySQL simply does not support yet.
While MySQL scales out horizontally very well, it is not regarded for its ability to scale vertically – that is handling heavy work loads on a single high-performance machine.
There are a number of attractive features in commercial databases that warrant the sometimes higher cost in licensing fees, and I think IBM, Microsoft and Oracle are all playing it smart when it comes to offering their low end platforms for free.
Mysql is not the only opensource DB system.
They don’t use the stock product (the one you get with a “apt-get install mysql-server”). This kind of setups use highly tuned and customized (hacked) versions of the software.
Besides, the dirty work on Facebook it’s done by a large no-sql caching system.
Your wife is asking a brilliant question. The answer for it is very similar to the question “why do I need a truck when I have a minivan”. If what you want to do is drive a bunch of kids to soccer practice (football for those of you outside of North America) then minivan is exactly what you need. But if you are in the construction business and you need a vehicle to carry your crew of 4 and half a ton of materials then you will want to look at the truck with extended cab. Think of DB2 Express-C as being a truck with extended cab and MySQL is a trusted family minivan. Sure you could rig the minivan with a trailer and load it up with your construction materials. You may burn out your transmission by hauling too much load but hey, it will give you a start in construction business with very little money at first. You are a Facebook/Twitter/Google, you have brilliant engineers who can afford the time to rig the hitch and a trailer. You may get really sophisticated with sharding and master slave replication, memcached and a dozen of other tricks to get your MySQL to similar performance and reliability that you can get with DB2. But if you are building a business and business applications you may do better by using a better foundation where all of that stuff is already built for you.
What we did with DB2 Express-C is made it follow the same business model as MySQL i.e. get it for free with very inexpensive optional subscription if you need it. We made that subscription even cheaper than MySQL. Why did we do this? Because we really wanted everyone to get an opportunity to get started with the technology that has a lot more legs and can take you much further than rigging your own trailer for a minivan can. We are nto under any allusion that the world will start using MySQL, nor should it. It is obviously the right platform for Facebook. But for those of us not building the next Facebook, DB2 Express-C and its business model of “Free license with low cost optional subscription” for the worlds’ most respected commercial DBMS this may be just the ticket.
Most “Express” products available from other companies are free. Well done trading on that.
My thought is, no one is going to pay $1500 for a crippled database engine with limitations. They’ll just use one of the other free systems, and DB2 will continue to be used by companies who only deal with IBM.
Most of the features you’ve listed are more than adequately covered by Postgres.
Such as live replication, which Postgres just started to include in its core last month?
Don’t get me wrong, Postgres is a fantastic database. It’s a truly capable platform among its lighter weight brethren (MySQL and SQLite) but it has performance issues not found in its commercial competitors. Check out their mailing lists regarding SELECT COUNT(*) — everything seems to have its trade offs but when it comes to commercial competitors, sometimes price is the better one to give up.
Thanks for mentioning COUNT(…).
The reason PostgreSQL’s count() is slow is because of the MVCC design and the way tuple visibility information is stored. The MVCC design is part of what permits Pg to avoid having readers block writers and vice versa and what lets it minimize use of locking. The downside is that it has to scan the whole heap to determine which tuples the calling transaction can see in order to count how many the transaction can see.
I’d be interested in how DB2 handles this. Is it a a low-lock design where readers don’t block writers and vice versa, like PostgreSQL is? If so, does it use MVCC or some other approach? And if MVCC, how does it efficiently maintain a tuple count for a table?
There’s ongoing work in PostgreSQL that’s trying to use the visibility map (currently only used for autovacuum) to skip over sections of tables that’re known not to be visible to any transaction. That’ll help a fair bit. Hopefully in the longer run the work into covering indexes and supporting visibility information in indexes should cover the gap completely.
The complaints about COUNT(*) are mostly from webapps, though, and mostly when trying to get a count of records in the entire table. I rarely see “line of business” apps that seem to want to quickly and frequently find out how many records are in a whole table – as opposed to a subset of records, which is slower in ANY database.
PostgreSQL’s performance is far from perfect – but honestly, that’s true of everything out there, different systems just make different tradeoffs. To me, though, the bigger advantages of Oracle (and perhaps DB2) are in the areas of massively superior replication and clustering, better monitoring and tuning tools, and better tools for load accounting/user accounting/etc.
I must say, though, that DB2 has typically been undersold by IBM, who do an incredibly poor job of communicating to *technical* people why they should care, instead focusing on imposing it on IT by pushing it to management. It has a bit of a reputation as an expensive dinosaur, though not as much so as UniVerse. Some good documentation on its *technical* features that’s honest about what it’s not good at as well as what it is good at would go a long way toward making me, at least, have some interest in it.
“Proven reliability and advanced security features”
— The only way I know if it works is if I can see the code and test it. “Proven” my ass.
Utmost performance;
–What does that even mean? Besides, open source systems (MySQL, PostgreSQL) run fast as hell.
Ability to natively handle XML documents and data (some industries have to do so as a result of regulations);
–Makes sense. Yey for the compliance chicken! Not many do this, you’re right.
Vertical and horizontal scalability as their business grow;
–Again, MySQL and PostgreSQL are extremely scalable. See also: Facebook, Google, Yahoo’s 2 petabyte PostgreSQL database, etc.
Advanced and easySQL Replication and High Availability and Disaster Recovery (HADR) tools;
–Could use some work in open source solutions, but they aren’t bad either.
Backup compression to save storage costs (it’s not an issue if you have a web app that stores a grand total of 50MB of data, but it’s fundamental when you have 50TB);
–Ok, open source tools have these too.
24/7 support from a reputable company that will truly stand behind the product they sell;
–Marketing garblygook. What koolaid are you drinking?
Ease of administration (DB2 offers autonomic features that pretty much self-adjust DB2 automatically based on the workload and usage it’s handed);
–So do open source solutions.
Overall TCO (which involves a lot more than just looking at the price tag for the software).
–Exactly. TCO is much lower when you can look at the goddamn code yourself.
“24/7 support from a reputable company that will truly stand behind the product they sell”
I agree with the poster taking issue with this. I’ve rarely had good experiences with commercial support; usually I find myself talking to a drooling moron who wouldn’t recognise a stack trace from a segfault and certainly won’t understand it. You have to battle past these people and their “help” to get to submit an obvious bug, with test case and crash dump, to someone capable of understanding it. So you land up not only doing most of your own support anyway, but then have to fight past the company’s support people to get bugs acknowledged and fixed!
I haven’t dealt with IBM for this, so perhaps they’re a rare exception to the general rule. I *have* dealt with Adobe (utterly pathetic), Quark (worse), Microsoft (surprisingly good especially for MSDN subscribers) and Apple (better than Quark, but that’s about it) and my experiences have not impressed me. Adobe in particular deserve special mention for selling expensive multi-user licenses for software they then completely fail to support even to the point of fixing reproducible and obvious bugs.
Most people use the “big company” argument more for cover-your-ass. “It’s IBM’s fault, not mine, that our database broke!”. Some try to claim it gives you someone to sue if it breaks – but I’ve never heard of anyone successfully suing Oracle, IBM, etc and their army of lawyers over data loss/failure. I doubt IBM are going to speak up by showing you cases where they’ve failed, either – they’ll settle quietly out of court if they don’t just stonewall.
So: I don’t buy the support argument. I’ve invariably had better support when I’m paying a 3rd party who doesn’t roll support into licensing for support, because they’re motivated to actually deliver on their claims or lose business.
I’m biased because I’m a DB2 admin, but I’d like to give a response to some of your statements.
“Proven reliability and advanced security features”
– The only way I know if it works is if I can see the code and test it. “Proven” my ass.
— I get your concern. However, most people/companies don’t have the time nor resources to go through the code or even understand it. Given that the codebase for the DB2 optimizer is larger than the entire MySQL product, this is impractical to say the least. DB2 runs on practically everything, has been doing so for a very long time and for me and most businesses this means proven stable.
Utmost performance;
–What does that even mean? Besides, open source systems (MySQL, PostgreSQL) run fast as hell.
— Simple, it’s very fast. He didn’t say it’s the fastest in any situation, did he? Don’t pay too much attention to the ridicilous TPC-C numbers, but the performance and the optimizer are very good. I have seen similar capabilities to the new “Join Removal” feature from PostgreSQL 9 for a long time in DB2. For a more simple comparison – DB performance comparisons are Hard – just check the count(*) comparison on this blog (google “android sheep db2 mysql”).
Ability to natively handle XML documents and data (some industries have to do so as a result of regulations);
–Makes sense. Yey for the compliance chicken! Not many do this, you’re right.
— I know a government organisation that stores the tax filings in XML because they change too much every year. I used the XML publishing options once because the memory Hibernate was using for this purpose was going to the roof in this particular case. There are good reasons to use XML as there are very good to not do it (popular ORM incompatibility comes to mind). I’m glad it’s there.
Vertical and horizontal scalability as their business grow;
–Again, MySQL and PostgreSQL are extremely scalable. See also: Facebook, Google, Yahoo’s 2 petabyte PostgreSQL database, etc.
— In the case of Facebook and Google, it’s no so much the database that is scalable, but the applications that are using it. Making the application responsible for handling database outages, replication, … is only available to companies of a large size with specific requirements (who cares about the loss of two facebook posts, really?). There is still a large userbase that just wants a place to dump and retrieve a lot of relational data really fast while knowing it will be kept safe there. In this case, things like DB2 HADR/PureScale, Oracle Data Guard/RAC, … really shine. All this sharding people program on top of MySQL? DB2 has provided this capability for years (http://planet.mysql.com/entry/?id=17130). Not for free though…
24/7 support from a reputable company that will truly stand behind the product they sell;
–Marketing garblygook. What koolaid are you drinking?
— I can tell you first hand that IBM support is competent. You quickly get a qualified person on the phone, the labs are not that difficult to reach through support if needed and around-the-sun support is available if you’re in really deep ****.
Overall TCO (which involves a lot more than just looking at the price tag for the software).
–Exactly. TCO is much lower when you can look at the goddamn code yourself.
— There’s a lot to TCO, but I doubt looking through the code yourself is going to improve it a lot…
>“Proven reliability and advanced security features”
>– The only way I know if it works is if I can see the code and
> test it. “Proven” my ass.
You must be one of one thousands of 1% who look at the DBMS code. The rest of us would rather somebody else looked at the code and fixed it.
You rely on DB2 to do its job every time you go to your bank, book an airline ticket or make an insurance claim. Surprisingly, the people who run banks, insurance companies and airlines are able to do so without having access to source code.
On a personal note, I have full access to DB2 source code as I do to MySQL and PostgreSQL. I have 20 years of experience in DBMS. It never occurred to me that I need to look in to source to figure out what the problem is. Personally, I look at the traces to try to figure out who is the right person to look at the problem. I am willing to bet that I am more of a typical DBMS user than those like you who look at the source.
There are many out there who will claim there is a huge benefit from having access to source or being able to contribute code. Reality is, that most of these people are more in love with the philosophy than involved with the practice of open source. In 2009, the total number of contributions to MySQL was less than 40.
i somewhat disagree with some things you said here, one of them is “24/7 support”… i had a job supporting oracle eBusiness Suite and related databases; if by “24/7 support” you mean a support website you are right… even when my employer was paying top dollar for licenses and support it was difficult to get a human on the phone, if we wanted a human CAPABLE of resolving our problems that was another story.
maybe IBM offers better supponrt, i dont know… just my 2 cents.
This was a great start at a list of technical reasons, but I think people overlook two key points about databases. First, hardly anyone uses a database as a standalone application. That is, very few people are running a service where the customer issues SQL directly to a database. While this was once common (e.g., browsing IMS on a mainframe and tapping various PFkeys to launch behaviors), that sort of thing is dead. Today we run apps on top of the database.
Which brings me to the main reason people choose a particular commercial database: because the app they want to use runs atop that database and no other.
You will almost never (again, today) find a company saying to themselves, “Gosh, I’d like to run XYZ app, but I’m absolutely committed to the Sparc. I’m not going to install any Intel systems because I don’t like Intel’s business practices. I guess I’ll have to find a different app, write my own, or give up.” So why would you argue that about a database? You really don’t want an accounting system because it only runs on [ Oracle | SQL Server | DB2 ]? You’re going to write your own? Good luck.
I’d venture a guess that although Facebook is running a great deal of their service off of MySQL, there’s still some commercial database their accounting is running off of. They’ve probably got a copy of COGNOS or some other data warehousing app squirreled away in a teeny tiny corner for data crunching experiments that’s sitting atop Oracle or DB2. They’ve probably got a copy of good old SAS or SPSS too, although I’m sure they prefer to use R when possible. But “when possible” are the key words. Not “I prefer.” People assume infinite choice, but that’s just not so.
I feel the availability of DB2 sw for developers to put their hands on (for free) for training themselves.
A good documentation is required too.
This will increase the trained developer base.
Saying that customers choose commercial databases because it is cheaper in the long run, implicitly states that said customers know what they do, and I know for a fact, from experience that this is not the case.
The majority of software purchases are made by people that are not qualified, and the only thing these people care about is this: “Will this cover my ass in case something goes wrong?”
As a result, many small companies pay through their noses, because they end up over-buying, both in hardware and software.
I am also support open-source, but my concerns are mixed.
Big companies still stick to Oracle or DB2 because of legacy reasons. Open-source RDBMSs were not so mature at that time. I found senior management worried to migrate database if everything is running fine, even if they are paying to DBAs.
Secondly, with Closed-source RDBMSs like SQL Server, Oracle or DB2 (Enterprise), they can file a suit or blame someone if something goes wrong. Building an enterprise-class RDBMS and giving support to big companies is not a honeymoon.
In real-life cases I have seen customers giving warnings to database vendors when a bug disrupted their work. It costs customers millions.
Now coming to our good open-source era. With PostgreSQL matured enough, we have options to buy support and in the last few years I have seen we never felt a need of a dedicated DBA, though our company has appointed one to make their nights cool.