Tag Archives: mysql

Fetching the most recent entry from a log-table

Sometimes there’s a need to keep a simple log in a database. A common format could be a table with a layout like this:

log
- area (CHAR)
- lognotice (CHAR OR text)
- logtime (TIMESTAMP WHEN the event was logged).

Fetching

Fetching all log entries from a certain area is a simple matter of fetching by the area field, but when building a dashboard with the most recent entry from each area is slightly more complicated – the Query to fetch the data could typically look like this:

SELECT * FROM log log1
WHERE logtime = (
	SELECT MAX(logtime) 
	FROM log log2 
	WHERE log2.area = log.area)

Cleaning up

To keep things clean and tidy, I only sorte data from the past month, week or day (depending on the “log intensity”). To achieve this I usually do something like this:

DELETE FROM log WHERE logtime < ###time###

Mysql: display row count for all tables in a database

When playing the role of the DBA, it’s often useful to get a quick listing of how many rows each table in a database contains. The syntax for this is pretty simple in Mysql:

SELECT TABLE_NAME, table_rows 
FROM INFORMATION_SCHEMA.TABLES 
WHERE TABLE_SCHEMA = '***database name***';

Replace ***database name*** with the actual database name in the SQL above.

Notice that when using innodb tables, it’s only a rough estimate.

Moving to PHP on 64 bit… the isssues & challenges

So your current website – if running PHP – and it seems to work just fine. I am however working on a project, where the new servers are running on a 64 bit version of the OS. This change seem to cause a number of potential issues, and as there didn’t seem to be a resource collection the issues, I’ll try to post a few notes on the experience. Please feel free to add applicable notes and links in the comments.

Our first experience was that all our scripts seemed to use a lot more memory than it did on the old server, but there are also number of other 64 bit challenges, you should be aware of. This post is trying to provide an overview of these changes.

The Integer issue

On a 32bit OS, PHP uses 4 bytes (of 8 bit) to define an integer. On a 64 bit system, PHP uses 8 bytes (of 8 bit) to define an integer and thus allows it to store a far langer range of numbers.

You can test this with a simple script such as this:

1
2
3
4
5
 echo 'The integer size on this system is: ';
 echo PHP_INT_SIZE. '<br>';
 
 echo 'The maximum value you can save in an integer is: ';
 echo PHP_INT_MAX;

On 32 bit, the script will output:

The integer size on this system is: 4
The maximum value you can save in an integer is: 2147483647

On 64 bit, the script will output:

The integer size on this system is: 8
The maximum value you can save in an integer is: 9223372036854775807

Generally speaking the only drawbacks of this approach is an increased memory usage and maybe a lower performance – given you script doesn’t need the etra 32 bits provided on a 64 bit system.

This simple little script can simply illustrate the increased memory use:

1
2
3
4
5
6
7
$test = array();
 
for ($counter = 0; $counter < 10000; $counter++) {
  $test[$counter] = $counter;
}
 
echo "Memory peak usage: ", memory_get_peak_usage();

On a 32 bit system the number output from the script is (roughly):
810020.
On a 64 bit system the number output from the script is (roughly):
1517960.
That’s an increase of memory usage of more than 80% on a simple integer array!

Time and dates

Beware that many time related functions in PHP works with integers – such as mktime, strtotime and others uses integers as return values. As long as you use and work with these within the 32bit boundaries, you should be fine.

On 64 bit systems, they are able to handle much larger ranges, which could cause issues, if you allow that to happen.

Memory and performance

As the data volumes being moved around is increased, you could expect a performance penalty. On sites with low traffic volumes, it’s probably not an issue, but if you’re hosting a high volume site, it might be to some extend.

The extra memory seems to be a much larger issue to be aware of. While you may only assume you use a small number of integers, PHP itself does use them many places. When you’re creating arrays – they probably are indexed by integers and many functions return integers as control codes. While the required memory doesn’t double, do expect an overhead of 25-50% depending on what the script does – from the initial experiences; it does seem to be the case.

Bit shifting

Generally speaking, you should be aware every where you use bit shifting operations, as they by their very nature, is quite dependent on the number of bits in the variables available.

Handling Hashes

If you’re using hashes for checksums, beware. Some 64 bit issues may occur.

We’ve seen this issue on the crc32-function. If the result of the CRC32 is a positive number (on a 32 bit system), it will be the same on a 64 bit system. If the CRC32 results in a negative number however, the return result on a 64 bit sytem will be different.

This script:

1
2
3
  echo "<p>Letters 'ab'<br>";
  echo crc32('ab');
  echo "</p>";

Produces the following output on a 32bit system:

Letters 'ab'
-1635563411

But on a 64 bit system, it produces this output:

Letters 'ab'
2659403885

Note the returned hash is always the same, so if you’re using the crc32-hash on a completely 32bit setup OR a complete 64 bit setup, you’re might see any issues, whereas a mixed environment probably will cause issues.

Hash functions such as MD5, SHA1 and others – will always produce the same result no matter what system they’re running on.

PHP, MySQL and 64 bit

Mysql handles integers different than PHP. An integer in mysql has always the same size no matter if it’s running on 32 or 64 bit systems. An integer is always 32 bit. If you need to store a 64bit integer, mysql has an explicit data type – BigInt for this purpose, which is a 64 bit Integer (see mysql manual on Numerical types).

How to handle mysql seems to depend on what kind of PHP solution you’re building. If your application is deployed across several servers (which may be a mix of 32 and 64 bit systems), to two core strategies – is to either handle it in PHP or in Mysql.

Handling the issue in PHP would probably suggest, that you some how “range check” the PHP integer values and make sure the value is within the range allowed by a 32 bit integer.

Handling the issue in Mysql, would mean to just change the integers in the database to BigInts. This would always work, but for all 32 bit system be a less efficient solution.

Database optimization thoughts

If you’re working on heavy duty websites, knowing your database and how to use it best can make a world of difference in terms of performance, and thus you should always optimize the database. That’s pretty much obvious.

The tricky part is how you do the optimization? Often it requires a lot of reading up on how the database works – strengths, weaknesses and other details — and loads of experience. Having a DBA available to help you optimize would be ideal in some cases, but often you need to do it yourself.

So, is there a “free lunch” recipe with guidelines to help you do the correct optimization? Well, no. All database optimizations are usually case-specific, and the optimizations which worked last time may not be applicable in the current case.

There are however some generic rules, which may help you go in the right direction.

  • Database optimization should start early. Think about performance when designing you database scheme – table layout and column types.
  • Consider the transaction types during the data life cycles. Are you primarily doing reads or writes? How many columns is expected?
  • Learn to use indexes – wisely. Too many indexes is just as bad as no indexes.
  • Try to benchmark various table and column layouts and see how they perform – sometimes you might be surprised and other times just confirm theories.

Substring magic with mysql

Mysql is a wonderful database, and while many use it, most people only scratch the surface of what the database can do. One of the practical functions available is the substring_index function, and an imaginary mailing list example is a nice way to show how to use it.

Let imagine we have a mailinglist in a table named “mailinglist” and it has a (char) column with the email addresses subscribed to the list. We now want to figure out how many users, that are subscribed from the various domains in the list.

Finding the domain name from an email address is quite simple – just find the @ sign – anything past that, will be the domain name and substring_index will do just that. To create our list of domains with the number of subscribers, we simply issue this query:

SELECT SUBSTRING_INDEX(email, ‘@’, -1) AS domain, count(*) as subscribed
FROM mailinglist
GROUP BY domain
ORDER BY subscribed;

Some email providers may use 3rd level domains (sales.example.com, tech.example.com, etc). What it we want to summarize the subscribers on the second level (example.com). No worries – substring_index will help us with that too. The query to do that looks like this:

SELECT  SUBSTRING_INDEX(SUBSTRING_INDEX(email, ‘.’, -2),’@’,-1) AS domain,
count(SUBSTRING_INDEX(SUBSTRING_INDEX(email, ‘.’, -2),’@’,-1)) AS subscribed
FROM  mailinglist
GROUP BY domain
ORDER BY subscribed;

While most developers may master simple queries in SQL, most databases have a library of functions – like substring_index – available and must too often they are ignored and hardly used at all.

If you want to be a better developer, learn to use the entire toolbox available – not just what you know already in Perl, in PHP or what ever you use to do your programming.

Mysql metadata

If you’re a developer and use mysql, I’m sure you’re aware that it’s a database and it quite good at storing data, but one of the neat things about Mysql (and most other databases) is also their ability to provide meta-data on the contents of the database.

Most people know how to use the meta-data queries in the commandline, but if you want you can also use them in your (php/perl/some-other- ) language. Here is a quick guide to some of them.

show databases

The show databases provide a list of all databases available in the datbase-server you’re accessing. It doesn’t tell you which of the databases, you’re allowed to access.

Once a database is selected, you can see a list of tables with the command:

show tables

And with either the ”desc tablename” or with the command

Show columns from tablename

(replace ”tablename” with an actual tablename from the database).

You can exclore which columns and column definition is available.

It’s probably rarely you need to use these functions unless you’re writing a phpmysqladmin replacement – often a script makes assumptions on which tables and columns exist.

If you’re developing an upgrade to an existing application/webbsite/script and the update requires database changes, you can use these functions to check if the database layout version is the one matching you application version needs. By doing this, you can provide much better feedback to the user on what’s wrong with the script, instead of just breaking horribly with database errors.

Should you use sql specific statements?

It seems there are two camps when it comes to SQL and how to do database optimizations – the “generic camp” and “the specialist camp”. While I don’t consider myself an extremist, I am absolutely in the specialist camp and this little post is an explanation of why.

SQL is a generic database langauge . There are a few different standards in use (the language has progressed over time), but the core of the SQL language is pretty much the standard in most databases. It’s probably also standard – in any database – that the SQL standard has been extended with database-specific extensions which provides optimizations, functions or other options not available in the SQL standard.

Using these database-specific extensions while developing your application ties your application/website to the specific database, and if you need to switch database at some point, your need to rewrite your applications SQL statements, so they aren’t tied to that specific database.

While this may be true I haven’t once during my ten years of web development, once had to switch database either during development nor during operations. I’m sure it happens in some cases, but I’m also sure that those cases are pretty rare, and if you need to go over your application and change the SQL statements, spending time on that is probably one of the easiest parts of a “technology switch” (ie. switching from Mysql to an Oracle cluster).

In most cases, using and utilizing database specific extensions can provide you with some easy optimizations and boost the performance significantly. While you probably can avoid using them, you’ll probably need to move the functions into you application or make more complex database queries. Optimization is usually an evolution, not a revolution. If your performance isn’t as expected, the first step is usually where are the bottlenecks, where can we optimize the current state of things – not switching database, not switching programming language.

Before you become a SQL purist, do make a calculated guess on what the “database switch probability” is. In most cases it’ll probably be less than 1%, and if this is the case, all common sense should tell you to use the tool available to the best of your ability, right?

Basics on tables in Mysql

Navigating around in mysql is quite easy. You can see which databases by using the command “show databases“. There will always exist a database called “mysql” which is used by mysql itself.

If your user account has access to a database, you can access it by using the “use DATABASENAME” command. You can se which tables exist within a database by using the “show tables” command – and see details for each table by issuing the “desc TABLENAME” command.

changing Tables

All alterations of database tables is done with the alter table command. The main challenge is use the command beyond just making it work. Here’s a few clues to get the most of it.

Let’s make a simple table as an example:

CREATE TABLE demotable (
id INTEGER UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY);

Adding a column

ALTER TABLE demotable ADD COLUMN label CHAR(20);

More…

Find more details in the Mysql manual.

Mysql: Random dice

Getting a random roll of the dice:

          CREATE TABLE dice (
            d_id int(11) NOT NULL auto_increment,
            roll int,
            PRIMARY KEY  (d_id)
          );

          insert into dice (roll) values (1);
          insert into dice (roll) values (2);
          insert into dice (roll) values (3);
          insert into dice (roll) values (4);
          insert into dice (roll) values (5);
          insert into dice (roll) values (6);

          select roll from dice order by rand() limit 1;