[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: [school-discuss] Metadot and MIT



PostgreSQL vs. MySQL

Building better databases

By Brian Jepson

To many people, PostgreSQL and MySQL seem like similar, alternative databases. Both are quickly gaining popularity. Based on the track records of older versions, there's a lot of debate over the speed of PostgreSQL and the durability of MySQL. But times have changed and each database has progressed. On both counts, the two packages are the closest they've ever been, so when deciding which to use in a Web application, a developer doesn't always have a clear winner.

MySQL's claim to fame is that it provides a reasonable set of features, such as built-in SQL functions, that follow the 80/20 rule: It has the 20 percent of SQL capabilities that are needed for 80 percent of database applications. Developers of simple applications can live without the remaining features, such as stored procedures and subqueries, or can work around them with creative client-side programming.

PostgreSQL, on the other hand, provides more features than MySQL. These include more SQL functions, server-side procedural languages, and sophisticated methods for date manipulation. PostgreSQL also offers object-relational capabilities and geometric data types. If you're developing an application that has highly complex business rules, PostgreSQL lets you handle business logic on the database server.

The ACID Test

A good way to differentiate databases and test overall quality is to perform an ACID test. ACID is an acronym that describes four properties of a robust database system: atomicity, consistency, isolation, and durability. These features are scoped to a transaction, which is a unit of work that the programmer can define. A transaction can combine one or more database operations, for example:

1. Atomicity is an all-or-none proposition. Suppose you define a transaction that contains an UPDATE, an INSERT, and a DELETE statement. With atomicity, these statements are treated as a single unit, and thanks to consistency (the C in ACID) there are only two possible outcomes: either they all change the database or none of them do. This is important in situations like bank transactions where transferring money between accounts could result in disaster if the server were to go down after a DELETE statement but before the corresponding INSERT statement.

2. Consistency guarantees that a transaction never leaves your database in a half-finished state. If one part of the transaction fails, all of the pending changes are rolled back, leaving the database as it was before you initiated the transaction. For instance, when you delete a customer record, you should also delete all of that customer's records from associated tables (such as invoices and line items). A properly configured database wouldn't let you delete the customer record, if that meant leaving its invoices, and other associated records stranded.

3. Isolation keeps transactions separated from each other until they're finished. Transaction isolation is generally configurable in a variety of modes. For example, in one mode, a transaction blocks until the other transaction finishes. In a different mode, a transaction sees obsolete data (from the state the database was in before the previous transaction started). Suppose a user deletes a customer, and before the customer's invoices are deleted, a second user updates one of those invoices. In a blocking transaction scenario, the second user would have to wait for the first user's deletions to complete before issuing the update. The second user would then find out that the customer had been deleted, which is much better than losing changes without knowing about it.

4. Durability guarantees that the database will keep track of pending changes in such a way that the server can recover from an abnormal termination. Hence, even if the database server is unplugged in the middle of a transaction, it will return to a consistent state when it's restarted. The database handles this by storing uncommitted transactions in a transaction log. By virtue of consistency (explained above), a partially completed transaction won't be written to the database in the event of an abnormal termination. However, when the database is restarted after such a termination, it examines the transaction log for completed transactions that had not been committed, and applies them.

PostgreSQL is ACID compliant. The standard table handler for MySQL is not ACID compliant because it doesn't support consistency, isolation, or durability. However, the default table handler supports atomicity using table locks. And fortunately, there are table handlers available for MySQL that provide various degrees of compliance. As I'll describe later, NuSphere's Gemini table handler is fully compliant. So are the Berkeley DB and InnoDB table handlers included in the most recent versions of MySQL. If you want to use these table handlers, you need to either obtain a version of MySQL in which they're specifically compiled, or compile MySQL with these handlers enabled. The MySQL documentation for CREATE TABLE gives instructions on how to use one of these alternative table handlers (at www.mysql.com/documentation).

Because of its limited feature set, MySQL is very fast. You can make it blindingly fast if your application can adapt to things like in-memory tables. As far as durability is concerned, you might lose some data if the plug is pulled in the middle of a transaction. With PostgreSQL, you have many features and can be confident that your data is safe. Yet, if you implement all of the features at once, your application's performance suffers. Fortunately, PostgreSQL has recently made significant performance improvements.



Dr. Robert G. Rittenhouse wrote:
On Fri, 8 Oct 2004, Michael Dean wrote:

  
ez publisher!  With regard to mysql, to distribute it within your 
organization, such as on several servers, requires a commercial 
license. 
    

I don't see this. According to the MySQL FAQ:
   With  the  GPL  license,  MySQL is available free of charge. Users may
   download  the  software  for free and modify, integrate and distribute
   it.  However,  GPL  users  must  abide  by the rules of the GPL, which
   stipulate  that  if  a  MySQL-based  application is redistributed, the
   complete  source  code  for  this  application  must  also be open and
   available for redistribution.

It's standard GPL. Within an organization you can do whatever you want. 
It's only distribution outside the organization that requires GPL 
source code availability, and even there MySQL is excepting many FOSS 
licenses.

The Apache license is certainly friendlier to those who might wish to 
incorporate the code in a commercial product. 

You may well prefer Postgresql for other reasons but in this application I 
don't see a problem with MySQL's GPL licensing.

Parenthetically, as a strong proponent of open systems/standards (as 
well as open source) I find it quite disappointing that switching DBMS's 
is not a simple component swap.

  
Postgresql is Apache licensing.  Many CMS or CRM packages can 
now use postgresql but not mysql.  There are several comparison studies 
on the web which document the differences.  In fact, at the last 
Linuxwolrd, mysql sought the assistance of postgresql in improving.  
Mysql is better at marketing.  My comparison was taken from cmsmatrix.org.