MySQL has long been known as the Web developers’ database platform. In fact, for many years, it was considered the stepchild of "real" relational database systems such as IBM’s DB2, Oracle and Microsoft’s SQL Server. However, in recent years, MySQL has grown in leaps and bounds both from the development side as well as in the eyes of developers and corporate IT staff internationally.
Developed partially in the 1980s, with the SQL portion built in 1995, its core developer, Michael Widenius (along with David Axmark and Allan Larsson), also founded MySQL AB. The current stable release is version 4.01.
MySQL is, of course, also an open source solution. According to Arjen Lentz of MySQL AB, the free and open source model is the best — it involves public scrutiny of the code, third party security audits, feedback throughout the development process, and battleÂtesting in many different environments.
MySQL’s penetration into the international market is vast, including more than 20 million total downloads, 4 million active installations and nearly 35,000 downloads daily. Included in this user base are such names as Yahoo!, Google, Cisco, NASA, Dunn & Bradstreet, Lufthansa, Hyperion and Sabre Holdings.
The licensing is unique as well, with a freely downloadable version available under the GPL open source license, and a commercial license available for those who need support and want warranties.
Where We Are Now? Version 4.0x (Production)
MySQL has been ideal in the Java, Perl and PHP Web development environments for some time. At individual site level, and at enterprise level, powerful Web applications have found success running on the MySQL database engine. Its capabilities were further reinforced through recent public benchmarks that compared MySQL’s performance against the industry heavyweights including IBM, Microsoft and Oracle.
The surprise to many non-adopters of the open source database was that, performance-wise, MySQL ran neck and neck with Oracle and outperformed the others in the relentless tests carried out by eWeek’s Test Labs.
There are some limitations in the current stable version, two of the most significant being subqueries and stored procedures. As Web applications become more sophisticated, these issues have emerged as barriers to wider acceptance of the platform. Some firms, such as NASA, have worked around the limitations using external scripting such as Perl.
However, there are also additions to the 4.0x production version that have helped position MySQL as a serious player in the database engine market. Significant among these changes were the inclusion of the InnoDB for transactional database efforts and, as eWeek noted, the ability to use both InnoDB and ISAM for non-transaction needs in the same database — an advantage that’s unique to MySQL. When paired with the ability to cache queries, this latter capability was the driver behind MySQL’s impressive performance. This cache feature allows the results of large queries to be stored in memory, enabling later queries to return results almost instantly.
Additional functionality, such as database replication, rounds out the feature set that makes MySQL the primary choice for Web app development.
Note that The MySQL Manual online has all the documentation necessary to set up and utilize the features discussed in this article. Additionally, if you plan to experiment with version 4.1 or later, be sure to use a development environment, as some changes cannot be downgraded after installation and grant updates are completed.
Why the InnoDB?
The InnoDB provides a transaction storage environment including commit, rollback and crash-recovery. This is extremely useful in Web applications that are used by multiple users concurrently, such as order management or CRM solutions. The creation of InnoDB tables is simple, as is illustrated by the addition to the create table statement seen below.
CREATE TABLE prospects (a INT, b CHAR (20), INDEX (a)) ENGINE=InnoDB;
The InnoDB also establishes its own dictionary according to MySQL’s documentation, so you can have numerous tables of the same name (i.e. clients, customers, orders, and so on) in databases that use InnoDB.
Finally, InnoDB allows for mixing of queries on InnoDB and non-InnoDB tables within the same sql, and of course performance is optimized when used in conjunction with the query cache.
The Query Cache
The query cache is most useful in the Web environment. Database-driven sites are primarily dynamic pages generated from database content. The cache can significantly improve the performance of these environments.
If you do not have the query cache turned on, you can do so as a test by simply using
mysqld --query_cache_size=10m, which sets the query cache to 10MB (use k for KB and g for GB). You can configure this to start at boot of MySQL using the set command:
mysql> SET query_cache_size = 10240;
Using the new MySQL Administrator, replication can now be managed through a graphical user interface as well as the mysql command line. Replication can be used as a backup tool, as a load-balancing tool, or as a migration tool when moving to new hardware (and possibly a new hosting environment where IPs are changing).
Considering the numerous options and changes in replication privileges as new versions are released, a discussion of the topic is beyond the scope of this article. See the replication documentation on the MySQL site for more information.
The Next Step – Version 4.1 (alpha)
The primary identifying addition to this release is subqueries. Subqueries are simply select statements nested within existing select statements. Unfortunately, in versions prior to 4.1, subqueries either needed to be avoided or required the development of somewhat complex joins and/or unions. Simply, subqueries are easier to write and read when reviewing an sql.
A simple yet powerful example of a subquery that cannot be accomplished with a join is show below:
SELECT partner_name, number_clients_referred FROM partners WHERE number_clients_referred = (SELECT MAX(number_clients_referred) FROM employees);
This if course finds all of the values in the table ‘partners’ that equal a maximum value in the table called ’employees’.
The "Golden Egg" – Version 5.0
Version 5.0 is currently under development, but in the preview, the one substantial feature that’s present in the known relational database systems is stored procedures. A stored procedure (sp) allows for a collection of sql statements to be stored in one command and called. These commands run on the server-side only and can be useful as they don’t require client side interaction once they’re called. Additionally, sps can increase data security and integrity.
Sps can be used for activities as simple as data validation on multiple raw imports of external data being processed and loaded into a database. They’re also useful to carry out sophisticated report generation, returning to the client side results that cannot be handled in a single sql statement.
For example, imagine you’re developing a yellow pages Web application, and receive business data from numerous sources, which is then aggregated and loaded into a single database. In this case, you could use an sp to validate that all Website and email addresses adhere to your proper format, and all phone numbers appear in the correct format, with dashes to separate country code, area code, prefix and so on.
Suppose you’re deploying a reporting solution that compiles reports for investment advisors who wish to see their clients’ accounts and transactions, as well as the data calculated to show commissions and the rate of return on investments. A stored procedure will enable you to wrap these various sql statements into one object that can result in a complete online statement for viewing and distribution.
Just announced at the MySQL Users Conference in Florida, MySQL has released a preview of MySQL Cluster. The system claims 99.999% uptime availability and seeks to enter the market now held predominantly by firms such as Oracle, IBM and Microsoft’s sql servers.
As I noted in Open Sourcery on the day of the announcement, one of the key factors for Web developers is the dual licensing (GPL open source and commercially supported license). This enables a Web developer to download and build a test environment of MySQL Cluster without any licensing costs (which is impossible with other vendors!) as a proof of concept for current and potential customers.
While the system is being touted as one for massive installations such as current MySQL customers NASA, Sabre Holdings and others, the ability to bring high-level clustering to any firm that needs critical uptime for commerce or extranets is substantial.
Though the current download offers preview mode only, MySQL has been working behind the scenes to build custom clusters for many of their existing enterprise clients. By bringing the product to preview, the open source community at large will be able to review, comment and recommend features in the evolution of this new product.
Two papers are now available from MySQL. One serves as an introductory whitepaper; the other provides a technical overview of the new MySQL Cluster architecture.
- Introductory whitepaper – http://www.practicalapplications.net/kb/mysql-cluster-whitepaper.pdf
- Architecture Overview – http://www.practicalapplications.net/kb/mysql_cluster_architecture.pdf
MySQL has surely established itself as the quintessential database for Web development. Furthermore, it is a rising star in enterprise environments where data sizes in the terabytes are common. MySQL has been shown to handle 50 million records per table and will only improve its scalability in release 5.
If you have not yet experimented with MySQL, a great place to start is with Kevin Yank’s tutorial series on using MySQL and PHP in the development of dynamic Websites.