Tag Archives: php

strpos in PHP – like being stung by a needle in a haystack

In PHP when you have a string and want to find out if it contains another string, there are a few ways to do it. You can use regular expressions, use the strstr functions and a few other methods.
The easiest way though is probably by using strpos, which returns the number of the character containing the first occurrence of the thing you’re looking for – and false if the string isn’t found.

Simple – yet with a slight danger.

$haystack = 'This is an example';
if (strpost('This', $haystack)) {
  echo "Found";
} else {
  echo "Not found";
}

In the example above, where we’re looking for the string “This”, the php code will echo “Not found”. The reason is, that the first (and only) occurrence of “This”, is at the begining of the string – character zero.
As strpos returns zero, the if statement is evaluated to false and thus the “Not found” is echoed to the screen.

Fixing the error is simple once you remember the “the first index in a string is zero with strpos” rule:

$haystack = 'This is an example';
if (strpost('This', $haystack) !== false) {
  echo "Found";
} else {
  echo "Not found";
}

Adding the !== false, forces a type check, and as the number zero is (exactly) false, the value echoed is “Found”.

Adding to php arrays

In PHP many things can be done several different ways. Picking which way to do something may be a matter of personal taste or habit. Sometimes however, things may be much clearer for the next developer, if you choose one way over another.

A very simple example of this, is adding a new item to an array. Often I come across this construct:

$valuepairs[] = 'Some value'

It’s valid and compact syntax, but in terms of clarity, I’d prefer this construct anytime:

array_push($valuepairs, 'some value');

PHP 5.4 built-in webserver & Linux (mint/ubuntu)

PHP 5.4 comes with a built-in webserver, which can be useful for development and quick tests. It easily launched from the command-line, but if you’re running Linux Mint or Ubuntu, the PHP version, isn’t 5.4 but 5.3.x. If you don’t have the time/courage/energy to compile PHP 5.4 yourself, some nice fellow on the internet has done the work and made it available through a package repository which makes it a breeze to install.

To install PHP 5.4 on your Ubuntu or Linux Mint simply do this:

1
2
3
sudo add-apt-repository ppa:ondrej/php5
sudo apt-get update
sudo apt-get install php5

(answer yes to any questions asked).

then you should go to go. Verify the update with:

php --version

.. and the "answer" should be something like:

PHP 5.4.4-1~precise+1 (cli) (built: Jun 17 2012 13:01:09)
Copyright (c) 1997-2012 The PHP Group
Zend Engine v2.4.0, Copyright (c) 1998-2012 Zend Technologies

(version numbers and dates are probably subject to change).

To use the webserver, go to the directory you want to be the document root, and launch the webserver with:

php -S localhost:8000

and you can also add a custom php.ini file with the configuration you want with:

php -c ./php.ini -S local:8000

Please remember, that the built-in webserver is only suited for development, but for a quick hack, it sure beats installing Apache or any other webserver.

Moving to PHP on 64 bit… the isssues & challenges

So your current website – if running PHP – and it seems to work just fine. I am however working on a project, where the new servers are running on a 64 bit version of the OS. This change seem to cause a number of potential issues, and as there didn’t seem to be a resource collection the issues, I’ll try to post a few notes on the experience. Please feel free to add applicable notes and links in the comments.

Our first experience was that all our scripts seemed to use a lot more memory than it did on the old server, but there are also number of other 64 bit challenges, you should be aware of. This post is trying to provide an overview of these changes.

The Integer issue

On a 32bit OS, PHP uses 4 bytes (of 8 bit) to define an integer. On a 64 bit system, PHP uses 8 bytes (of 8 bit) to define an integer and thus allows it to store a far langer range of numbers.

You can test this with a simple script such as this:

1
2
3
4
5
 echo 'The integer size on this system is: ';
 echo PHP_INT_SIZE. '<br>';
 
 echo 'The maximum value you can save in an integer is: ';
 echo PHP_INT_MAX;

On 32 bit, the script will output:

The integer size on this system is: 4
The maximum value you can save in an integer is: 2147483647

On 64 bit, the script will output:

The integer size on this system is: 8
The maximum value you can save in an integer is: 9223372036854775807

Generally speaking the only drawbacks of this approach is an increased memory usage and maybe a lower performance – given you script doesn’t need the etra 32 bits provided on a 64 bit system.

This simple little script can simply illustrate the increased memory use:

1
2
3
4
5
6
7
$test = array();
 
for ($counter = 0; $counter < 10000; $counter++) {
  $test[$counter] = $counter;
}
 
echo "Memory peak usage: ", memory_get_peak_usage();

On a 32 bit system the number output from the script is (roughly):
810020.
On a 64 bit system the number output from the script is (roughly):
1517960.
That’s an increase of memory usage of more than 80% on a simple integer array!

Time and dates

Beware that many time related functions in PHP works with integers – such as mktime, strtotime and others uses integers as return values. As long as you use and work with these within the 32bit boundaries, you should be fine.

On 64 bit systems, they are able to handle much larger ranges, which could cause issues, if you allow that to happen.

Memory and performance

As the data volumes being moved around is increased, you could expect a performance penalty. On sites with low traffic volumes, it’s probably not an issue, but if you’re hosting a high volume site, it might be to some extend.

The extra memory seems to be a much larger issue to be aware of. While you may only assume you use a small number of integers, PHP itself does use them many places. When you’re creating arrays – they probably are indexed by integers and many functions return integers as control codes. While the required memory doesn’t double, do expect an overhead of 25-50% depending on what the script does – from the initial experiences; it does seem to be the case.

Bit shifting

Generally speaking, you should be aware every where you use bit shifting operations, as they by their very nature, is quite dependent on the number of bits in the variables available.

Handling Hashes

If you’re using hashes for checksums, beware. Some 64 bit issues may occur.

We’ve seen this issue on the crc32-function. If the result of the CRC32 is a positive number (on a 32 bit system), it will be the same on a 64 bit system. If the CRC32 results in a negative number however, the return result on a 64 bit sytem will be different.

This script:

1
2
3
  echo "<p>Letters 'ab'<br>";
  echo crc32('ab');
  echo "</p>";

Produces the following output on a 32bit system:

Letters 'ab'
-1635563411

But on a 64 bit system, it produces this output:

Letters 'ab'
2659403885

Note the returned hash is always the same, so if you’re using the crc32-hash on a completely 32bit setup OR a complete 64 bit setup, you’re might see any issues, whereas a mixed environment probably will cause issues.

Hash functions such as MD5, SHA1 and others – will always produce the same result no matter what system they’re running on.

PHP, MySQL and 64 bit

Mysql handles integers different than PHP. An integer in mysql has always the same size no matter if it’s running on 32 or 64 bit systems. An integer is always 32 bit. If you need to store a 64bit integer, mysql has an explicit data type – BigInt for this purpose, which is a 64 bit Integer (see mysql manual on Numerical types).

How to handle mysql seems to depend on what kind of PHP solution you’re building. If your application is deployed across several servers (which may be a mix of 32 and 64 bit systems), to two core strategies – is to either handle it in PHP or in Mysql.

Handling the issue in PHP would probably suggest, that you some how “range check” the PHP integer values and make sure the value is within the range allowed by a 32 bit integer.

Handling the issue in Mysql, would mean to just change the integers in the database to BigInts. This would always work, but for all 32 bit system be a less efficient solution.

Defaults may be wrong…

Just a word of warning when using PHP and Mysql – if you’re trying to make efficient code and not utilizing all sort of frameworks and abstractions, you might be in for a small surprise in a default setting.

Usually is slightly lazy and often use the mysql_fetch_assoc function. It provides each row as an associative array, and is quite convenient to work with. Recently however while optimizing some code, I figured I’d switch to using mysql_fetch_array assuming it should be more efficient. The logic being that mapping hash keys to array values wouldn’t be needed and it should use less memory.

It wasn’t the case out of the box. Switching from mysql_fetch_assoc to mysql_fetch_array without doing anything else actually increases you memory use, and is probably slightly slower. By default mysql_fetch_array does not just provide the field values as array indexes, but still maintains the hash keys too.

If you only want the indexes in the returned rows, you need to add an extra parameter to the function stating this explicitly.

  $row = mysql_fetch_array($result, MYSQL_NUM);

I wonder why it was made so. It seems like an odd choice when mysql_fetch_assoc kan provide the row indexed with hash keys – The correct behavior for mysql_fetch_array (by default) ought to be to just return the array without the hashkeys – and have that option available if needed.

PHP best practice: Function Parameters

I’ve been developing web applications for some years now, and while I make no claims to being the world greatest developer, I do figure, that I do have some solid experience which may help others – or at least encourage thoughts and discussion. This is the first in a series of posts, and while it may be from a PHP developers point of view, it may applicable to other programming languages, and maybe even beyond web applications. Here are my four tips on function parameters.

Always have a default value on all parameters

Functions parameters are often used as input to SQL queries, calling webservices or computations. Null or non-existing values often tend to break these things and throws horrible messages to the end-user.

While the parts of the program using your function always should provide proper values, laziness, oversights or unexpected scenarios, may prove it not to be the case. If at all possible, always try to provide default values on all parameters to a function – and if you really can’t make sure it’s handled gracefully.

Choose reasonable defaults

When providing default values, don’t choose extremes. If you’re browsing in a list of usernames, don’t use a (pure) wildcard as default value – use an “A” to list all users starting with the letter A. If you’re function allows a limit on the number of rows returned form a database query (say for paging purposes), set the default to 10, 20 or some other low number, don’t go for worst case scenarios like 9999, or 999999.

If the developer using the function needs plenty of rows, it’s easy to pass a specific value, and if the developer forgets to specify the number of rows, expected, they get a reasonable result, which may help them to ask for more (if actually needed).

Always sanitize input

Even though a given function naturally only will be used with valid input and so on, every function should take steps to secure them selves.
One of the most basic steps is to make sure all input is sanitized. Protecting your function from making SQL injection threats or other security issues, is not the responsibility of the places utilizing the function, but the responsibility of the function it self, and thus it should take steps to make sure it doesn’t introduce security issues.

As basic validation of simple input parameters, look at the ctype functions in PHP. If you can always try to validate against a whitelist (which characters are allowed) instead of blacklists (these characters are not allowed) as missing things which may introduce issues in a black list is harder, than allowing what you expect, to pass through.

Accept an array of key value pairs

If a simple function accepts a single or two parameters, they may be passed as regular parameters, but once the list of possible parameters starts to grow, changing it a single parameter – which is an array with key/value-pairs, seems to be a much more solid long-term solution on keeping the interface developer friendly. Sure you can have endless lists of 10-15 parameters, but if you have default values on the first 14 values and just need to change the last, the code ought to be much more clean by being able to pass an array with the single key/value-pair needed to be changed from the default value.

That was the first four tips on function parameters. More on PHP and Security.

Your mileage may vary

It’s always fun to read articles with tips and tricks by other developers and see how they figure “best practices” are handled. Most developers do seem to thing they observations and practices are easily adopted by anyone, and should be accepted without any argumentation or reasoning behind the advice.

One of the nice examples of this, was a story called “5 tips and tools to develop php applications fast”, and while it may apply to the web applications developed by the author, it’s one of those where I question who is supposed to be the audience. All 5 tips are often found in many textbooks and other stories on the net, and in my experience they are all wrong (more or less naturally).

1) Use a MVC framework – it’s an industry accepted standard.
MVC as an idea may be accepted, but there are many ways of implementing the MVC – even with the various PHP frameworks. If you know the MVC pattern it may be an idea to look at the frameworks supporting it, but don’t get carried away.

  • If all your old code is non-MVC code and you don’t have any plans to MVC-refactor it, it may cause you maintenance overhead to try a completely different style  on a single project/web application.
  • If you aren’t used to using a (MVC) framework you need to invest time in learning it.

Do keep an eye on the overhead caused by the framework. If you web application has a heavy load and the framework causes 5-10% overhead, it may a bad idea.

2) Ajax Frameworks
Great idea.. but to utilize the power of an ajax framework, you need to learn how to use it. Also Ajax isn’t the holy grail of the internet. Sometimes it’s cleaver to use it, other times it isn’t. By using plain old HTML and reloading the page from time to time, it may be easier to debug the application.

Do also notice, that writing great/professional quality Javascript often requires just as great skills as writing professional PHP.

3) IDEs
An IDE – Integrated Development Environment – can be a great idea if you know your way around it, and use it regularly. An IDE is – like any other tool – something you need to invest time and resources into, if you want something back – and especially if we’re talking a beast like Eclipse-based tools.

4) Database creation/management software
Let me skip comments on database creating management tools. I’ve never really used them and don’t know if they’re a pain or a benefit. I have often seen though, that too easy access to the database, leads to sloppy database design (or at least something, that isn’t thought through).

So far it’s been my experience, that people who knows the command-line clients to databases, often has invested much more time in thinking their data models through and often spend a little more time considering the table layout, the field definitions, indexes and other important aspects, than those who use a nice graphical interface.

5) Object Relational Mapping
Object-Oriented programming isn’t a holy grail, and most people doing OO programming (in my experience) doesn’t genuinely do object-oriented programming – instead they use the OO-facilities in the language to slap functions together in a class.

There’s nothing wrong with procedural programming as such, and trying to force object-orientation into your code – and even abstracting it away through Object-relational mapping, has often lead to slow performance and odd database designs. If you know what you’re doing great, but there’s no free lunch, and trying to utilze ORM without having a pretty solid grip on what’s happening and why, rarely does anything good to your productivity.

I’m quite sure for some developers, using some sort of object-relational mapping might be a gain in programmer productivity, but do keep in mind, that if you’re developing a site with massive traffic, the programmer gains may easily be crushed in added server resource requirements – spending a few minutes extra optimizing and tuning applications, may in some cases save hours and hours of cpu cycles once the solution is deployed.

Getting back to the article, I’m sure the advise is well-meant and thought through by the author, but do always – as with all advise – think it through and see how may apply to your adoptation. In may daily development some of it is quite terrible (as I often work on a site with thousands if not millions of pageviews every week).

I’m wondering what my 5 tips would be for developing PHP applications fast would be. Check back in a few days, and there may be an answer – which may or may not – apply to you.

Code archeology

I’ve been spending quite some time the past days digging around in old code (3+ years old with no or very little changes). It’s quite fun noticing what efforts pay off and which doesn’t when you go back to make chance to old stuff. I’m sure my observations aren’t generally applicable. We’re web developers and are keeping web platform (several pretty big websites on a common codebase) alive year after year.

Proper and sane variable- and function names pays off instantly. Incredibly short or misguiding function and variable names doesn’t. While it may be fun naming functions “doMagic” or likewise, chances that you – or someone else maintaining the code years after will probably not appreciate it at all.

Reasonable commenting on functions and tricky code parts often come to good use, but too much commenting is distracting. Some is good, and less is often more valuable.

Version control logs can be quite valuable, if they’re descriptive for the change made. Too often though, was the comment “misc. bug fixing”, “problem solved” or something which years later was of no use or value at all.

Documentation kept apart from the source code seem to have none or very little value. Not once have I bothered to look in the wiki-documents or any other documentation which was in the code. In the few instances I did, minor changes and bug fixes had changed the code to such an extend (while not updating the documentation) that it was misleading at best.

The only cases where the documentation apart seem to make a positive is database diagrams and descriptions. They do seem to have value.

PHP Dynamic Caching with ZendPlatform

I recently had a little fun playing with the dynamic cache available in the Zend Platform. The Zend Platform is a commercial product, which provides a number of cool professional features to a PHP setup – one of these is the option to do dynamic caching.

With dynamic caching you can cache the output of a function for a determined period and instead of doing database queries or web service calls for a feature, you can cache the results, save resources and get faster pages.

If you have access to a PHP setup with Zend Platform, but haven’t looked at dynamic page caching, here’s a little example to get you inspired and started.

Dynamic Caching Example

Let’s suppose you have a busy news website and publish lists of news from categories – on the front page, along side news articles and several other places. to retrieve the most recent articles for each category, we’ve created a class with a function to retrieve the five most recent stories from a category specified by a parameter.

Our first version may look something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
class NewsCategory
{
        public function getCurrent($category)
        {
                $newsLines = Array();
		if (intval($category)) {
                $sql = sprintf('SELECT * FROM news WHERE category="%n"
                                     ORDER BY date LIMIT 5', $category);
                /* Asume connection to dbserver exists and is valid */
                $res  = mysql_query('newsdb', $sql);
                $data = mysql_fetch_array($res);
                $newsLines = Array();
                while ($newsItem = mysql_fetch_array($res)) {
                        array_push($newsLines, $newsItem);
                }
        }
        return $newsLines;
        }
}

Since news doesn’t happen too often and the lists of news are displayed on many pages, we want to cache the results for an hour – and we want to make sure, that the site works, even if we for some reason decide to disable the Zend Platform.

Rewriting the class to utilize the caching API takes a few changes and the Dynamic Cache-enabled version looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
/*
 * SAMPLE CODE; NOT FOR PRODUCTION USE
 */
class NewsCategory
{
        const CACHE_TIME = 3600; // caching time in seconds. Global caching set to 1 hour.
        const CACHE_KEY  = 'NewsCategory_getCurrent_'; // prefix for caching key
 
        /**
         * Returns array with current newsitems from the specified category.
         *
         * @param integer $category
         * @return array with news items
         */
        public function getCurrent($category)
        {
                if (function_exists('output_cache_fetch')) { // caching available?
                        return unserialize(output_cache_fetch(
                                self::CACHE_KEY . $category,
                                "serialize(self::fetchCurrent($category))",
                                self::CACHE_TIME));
                } else { // No ZendPlatform Available
                        return self::fetchCurrent($category);
                }
        }
 
        /**
         * Internal function for looking up the current news identified by $category.
         *
         * @param integer $category
         * @return array - returns empty array if no news is found.
         */
        protected function fetchCurrent($category)
        {
                $newsLines = Array();
		if (intval($category)) {
                $sql = sprintf('SELECT * FROM news WHERE category="%n"
                                     ORDER BY date LIMIT 5', $category);
                /* Asume connection to dbserver exists and is valid */
                $res  = mysql_query('newsdb', $sql);
                $data = mysql_fetch_array($res);
                $newsLines = Array();
                while ($newsItem = mysql_fetch_array($res)) {
                        array_push($newsLines, $newsItem);
                }
        }
        return $newsLines;
        }
}

The changes are:

  • The public function from the first version is replaced by a method, that handles the caching. The function doing the work is now hidden in a protected method in the class.
  • If the caching API isn’t available, the new function just acts a proxy passing data through.
  • The new caching functions creates caching keys from the method name and the parameter to make them predictable and avoid namespace clashes.
  • do notice that the cached results are serialized in the cache – the cache can only store string values.

Before going crazy with Dynamic Caching do consider a few things:

  • Is the data suited for caching? (or do you need the most accurate data every time)
  • What level do you expect the cache ratio to be at? (how often is the cache utilized)
  • Are other caching mechanisms in place? (ie. proxy-servers or likewise)