Tag Archives: sql

Fetching the most recent entry from a log-table

Sometimes there’s a need to keep a simple log in a database. A common format could be a table with a layout like this:

log
- area (CHAR)
- lognotice (CHAR OR text)
- logtime (TIMESTAMP WHEN the event was logged).

Fetching

Fetching all log entries from a certain area is a simple matter of fetching by the area field, but when building a dashboard with the most recent entry from each area is slightly more complicated – the Query to fetch the data could typically look like this:

SELECT * FROM log log1
WHERE logtime = (
	SELECT MAX(logtime) 
	FROM log log2 
	WHERE log2.area = log.area)

Cleaning up

To keep things clean and tidy, I only sorte data from the past month, week or day (depending on the “log intensity”). To achieve this I usually do something like this:

DELETE FROM log WHERE logtime < ###time###

Substring magic with mysql

Mysql is a wonderful database, and while many use it, most people only scratch the surface of what the database can do. One of the practical functions available is the substring_index function, and an imaginary mailing list example is a nice way to show how to use it.

Let imagine we have a mailinglist in a table named “mailinglist” and it has a (char) column with the email addresses subscribed to the list. We now want to figure out how many users, that are subscribed from the various domains in the list.

Finding the domain name from an email address is quite simple – just find the @ sign – anything past that, will be the domain name and substring_index will do just that. To create our list of domains with the number of subscribers, we simply issue this query:

SELECT SUBSTRING_INDEX(email, ‘@’, -1) AS domain, count(*) as subscribed
FROM mailinglist
GROUP BY domain
ORDER BY subscribed;

Some email providers may use 3rd level domains (sales.example.com, tech.example.com, etc). What it we want to summarize the subscribers on the second level (example.com). No worries – substring_index will help us with that too. The query to do that looks like this:

SELECT  SUBSTRING_INDEX(SUBSTRING_INDEX(email, ‘.’, -2),’@’,-1) AS domain,
count(SUBSTRING_INDEX(SUBSTRING_INDEX(email, ‘.’, -2),’@’,-1)) AS subscribed
FROM  mailinglist
GROUP BY domain
ORDER BY subscribed;

While most developers may master simple queries in SQL, most databases have a library of functions – like substring_index – available and must too often they are ignored and hardly used at all.

If you want to be a better developer, learn to use the entire toolbox available – not just what you know already in Perl, in PHP or what ever you use to do your programming.

Should you use sql specific statements?

It seems there are two camps when it comes to SQL and how to do database optimizations – the “generic camp” and “the specialist camp”. While I don’t consider myself an extremist, I am absolutely in the specialist camp and this little post is an explanation of why.

SQL is a generic database langauge . There are a few different standards in use (the language has progressed over time), but the core of the SQL language is pretty much the standard in most databases. It’s probably also standard – in any database – that the SQL standard has been extended with database-specific extensions which provides optimizations, functions or other options not available in the SQL standard.

Using these database-specific extensions while developing your application ties your application/website to the specific database, and if you need to switch database at some point, your need to rewrite your applications SQL statements, so they aren’t tied to that specific database.

While this may be true I haven’t once during my ten years of web development, once had to switch database either during development nor during operations. I’m sure it happens in some cases, but I’m also sure that those cases are pretty rare, and if you need to go over your application and change the SQL statements, spending time on that is probably one of the easiest parts of a “technology switch” (ie. switching from Mysql to an Oracle cluster).

In most cases, using and utilizing database specific extensions can provide you with some easy optimizations and boost the performance significantly. While you probably can avoid using them, you’ll probably need to move the functions into you application or make more complex database queries. Optimization is usually an evolution, not a revolution. If your performance isn’t as expected, the first step is usually where are the bottlenecks, where can we optimize the current state of things – not switching database, not switching programming language.

Before you become a SQL purist, do make a calculated guess on what the “database switch probability” is. In most cases it’ll probably be less than 1%, and if this is the case, all common sense should tell you to use the tool available to the best of your ability, right?