Aggressive PHP Smart Caching
Many ideas have been thrown around for a simple cache layer to add to the front-end portion of a web site to allow it to passively cache web pages as flat files. I have analyzed the methods I have seen people use, noted what I like about them and what I don't like, and then added what I feel such a caching system should be capable of in the real world scenarios that I have seen while working in the business.
I set out with my resulting wish list to construct a caching system that would address all these things, in as little code as possible.
The Wish List
- Cache according to an interval. In this manner, one would set a "refresh rate" according to which all cached files are refreshed.
- The interval at which these cached files expire should be syncronized, and not relative to the time a cached file was created. This minimizes the chance of two files that call out to the same dynamic data will offer a discrepancy. Each time a series of files with duplicate data expire at the same time, there is an opportunity that they will be re-cached with matching content.
- This system should neither rely on a cron script nor any other mechanism to explicitly push cached content, nor have to know where/if these cacheable scripts exist.
- Cached files should be stored in a centralized location, not relative to the actual script whose output was cached, so that the entire cache can be flushed easily.
- A single script should be cached separately for each request to it (i.e. foo.php?id=1 and foo.php?id=2 create two different cached files), but the caching system should place a cap on how many requests to a single script can be cached before all requests to that script are flushed. Otherwise, an exploit can be crafted to increment a get var and fill the cache.
- Once a day, the entire cache should be flushed so that any cached files who no longer possess a parent script will be cleaned up. This should not have to rely on any external processes (i.e. cron).
- Only GET requests should be cached, not POST. POST actions will differ from user to user and therefore should be excluded from the cache.
- The caching system should not just rely on serving a flat file cache of a script's output, but it should also send an expires header so that a browser will not repeatedly contact the server for the same cached file.
- Since the expiration interval is absolute and not relative to the file's cache time, the expires header that is sent must consistently use the expiration interval and not be based on the cache file's modification time.
- The expires header should always be sent, regardless of whether we will send cached content or generated content.
- There must be a way to exclude certain scripts from the caching mechanism. This is optimally done by making the cache mechanism a separate file from the app, so that the app never has to be loaded into memory. Excluded files can load the app directly. This will also allow for opting out any script which presents user-specific session data in its output. (Note: a possible alternative to inserting non-sensitive session data directly into a script's output is to store the data for display in a cookie and let Javascript display that data on page load, this way the script's actual output is the same for all users and would not have to be opted out of the caching mechanism.)
- The cache system should perform as few operations as possible in order to serve a cached file.
- The cache system should have low additional overhead to cache a script.
- Object-oriented programming should not be used to implement the caching mechanism, as this would add unnecessary overhead.
- The file containing the caching code should only contain the caching code. Only if a file needs to regenerate its output should any non-cache related code be included into the script. This means a web application should never be called through include or require unless the cache layer first determines that a script's output must be regenerated. Therefore, the caching mechanism should never be part of the web application itself. (As a side note, this method is proposed without any reliance on PEAR distribution.)
- The cache system must be FAST.
The Code
The following code is a proposed solution to all these items. It has been performance tested with Apache benchmark (ab) and XDebug.
File 1: script.php (the file whose output is to be cached)
<?php
require '/abs/path/to/cache.php'; // require the cache layer (this will also load the webapplication script needed by this file only if needed)
?>
<html>
<body>
Proceed with your script as you see fit.
</body>
</html>
File 2: cache.php (the caching mechanism)
<?php $cache_interval=300; // cache refresh rate (in seconds) $cache_dir='/abs/path/to/cache/dir/'; // absolute path to cache dir (must be writable byserver, must end with a /) $cache_script_limit=30; // maximum number of times a single script can be cached withdifferent requests before a flush of its requests is performed if (!isset($_POST) || empty($_POST)) { // if this is not a post action: $cache_script='cache_'.hash('md5',$_SERVER['PHP_SELF']); // get hash of current script (substitute md5() if hash() not available) $cache_query=!empty($_SERVER['QUERY_STRING']) ? '_'.hash('md5',$_SERVER['QUERY_STRING']) : ''; // get hash of query string $cache_hour=$_SERVER['REQUEST_TIME'] - $_SERVER['REQUEST_TIME'] % 3600; // get TS of startof current hour $cache_expires=$cache_hour+ceil(($_SERVER['REQUEST_TIME']-$cache_hour)/$cache_interval)*$cache_interval; // get TS of expiration time by rounding up to thenext interval from current time header('Expires: '.gmdate('D, d M Y H:i:s',$cache_expires).' GMT'); // always send expires header, whether we're pulling from cache or not if ($cache_mtime=@filemtime($cache_dir.$cache_script.$cache_query)) if ($cache_expires-$cache_mtime<$cache_interval) { // get filemtime of cached file if it exists, if it does and has not expired: readfile($cache_dir.$cache_script.$cache_query); // send data to browser exit; // terminate script }; define('CACHE_DIR',$cache_dir); // prepare constants for use in the output handler define('CACHE_SCRIPT',$cache_script); define('CACHE_QUERY',$cache_query); define('CACHE_SCRIPT_LIMIT',$cache_script_limit); unset($cache_interval,$cache_dir,$cache_script,$cache_query,$cache_mtime,$cache_expires,$cache_hour,$cache_script_limit); // clean up all cache-related vars ob_start('cacheSave'); // initialize output buffer which will be cached }; // INCLUDE WEB APP HERE function cacheSave($buffer) { // output handler for cache, saves the actual cached file $cache_mtime=@filemtime(CACHE_DIR.'cache_last_purge'); // get time the cache was last purged (once a day) if ($_SERVER['REQUEST_TIME']-$cache_mtime>86400) { // if cache was last purged a dayor more ago foreach(glob(CACHE_DIR.'cache_*') as $file) unlink($file); // purge cache touch(CACHE_DIR.'cache_last_purge'); // mark purge as having occurred today } else { // if cache does not need to be purged $cached_requests=glob(CACHE_DIR.CACHE_SCRIPT.'*'); // check for existing cached requests for this script if ($cached_requests && sizeof($cached_requests)>CACHE_SCRIPT_LIMIT) foreach($cached_requests as $file) unlink($file); // if cached requests exceeds limit, purge these requests }; if (!empty($buffer)) @file_put_contents(CACHE_DIR.CACHE_SCRIPT.CACHE_QUERY,$buffer,LOCK_EX); // cache the output buffer return $buffer; // display the output buffer }; ?>
Try It Yourself
Configuration
The following items need to be configured to make the script operate in your own environment.
- Copy the above code into your own cache.php file.
(get plaintext code) - Create a cache directory on your server. Ensure that the server has read/write access.
- Set $cache_dir to the path to your cache directory (must end in slash).
- Set $cache_interval to the refresh rate (in seconds) for your site. For example, 300 = 5 minutes.
- Set $cache_script_limit to the max number of requests to cache for a single script before flushing that script's requests. (Sugg: 30)
- If you have a web app to include before executing the requested script, replace the line "// INCLUDE WEB APP HERE" with the require/include call.
Your Feedback Requested
You're free to use this code in any way you please, but I would like to hear back from the PHP community about this caching approach. Please give it a try, and then use the form below to provide feedback.
"Aggressive PHP Smart Caching" was written by Alexander Romanovich, 2007.