scanning a directory with thousands of files

tl;dr: scandir() is slow and bloated. Use opendir() and readdir().
Sometimes in an older codebase things are written that work at first, then get slower over time but then reach a point where things completely fail. Originally scanning hundreds of files wasn’t so performance intensive but with project scaling and time, hundreds of thousands of files takes its toll with the PHP instance running out of memory after trying and then just stalling completely. (PHP 5.6) scandir($directory) was the offender. I don’t have metrics other than ‘it didn’t work‘ with about half a million 50kb files. I’ve since replaced it with opendir() and readdir() which seems to use a more efficient method of using memory pointers which gets to work straight away as opposed to using loading the entire directory contents into memory as scandir() does (or tries to when trying to work with a large number of files) before doing anything. What used to be there was basically:
//array_diff to skip the . .. dot paths that scandir lists under linux
$dir_array = array_diff(scandir($directory), array('..', '.')); //Would fail here under load

foreach ($dir_array as $filename) {
//do stuff
} //endforeach
And now looks like:
if ($handle = opendir($directory)) {
        while (false !== ($filename = readdir($handle))) {
            if ($filename != "." && $filename != "..") {  //ignoring dot paths under linux
            //dostuff 
            } //endif linux dir filename
         }//endwhile
} //endif opendir
This quickly got things working again without too much hassle being virtually a drop in replacement. Although it was still slow because of the sheer number of files to deal with. Eventually things were changed so that the situation where a single directory with millions of files accumulating over time doesn’t happen (and shouldn’t ever happen).

Leave a Reply

Your email address will not be published. Required fields are marked *