Tuesday, February 8, 2011

Would performance suffer using autoload in php and searching for the class file?

I've always struggled with how to best include classes into my php code. Pathing is usually an issue but a few minutes ago i found this question which dramatically helps that. Now I'm reading about __autoload and thinking that it could make the process of developing my applications much easier. The problem is i like to maintain folder structure to separate areas of functionality as opposed to throwing everything into a general /lib folder. So if i override autoload to do a deep search of a class folder including all subfolders, what performance hits can i expect?

Obviously this will depend on scale, depth of the folder structure and number of classes but generally I'm asking on a medium scale project will it cause problems.

  • __autoload is great, but the cost of stating all the files in a recursive search function is expensive. You might want to look at building a tree of files to use for autoloading. In my framework, I consistently name files for their classes and use a map that is cached for the data.

    Check out http://trac.framewerk.org/cgi-bin/trac.fcgi/browser/trunk/index.php [dead link] starting at line 68 for an idea of how this can be done.

    Edit: And to more directly answer your question, without caching, you can expect a performance hit on a site with medium to heavy traffic.

    JoshReedSchramm : This is really cool and i love it. Quick question though you said it is cached. What is the scope of the cache? i.e. how does mapfiles not get called again if the user changes page? I see that autoload checks if the collection isset but it seems liek the collection is reset for every page load.
    Crad : It shouldn't be reset for every page load, that's where the load comes from with a well hit site. I have it write a file in tmp and when I make a change to my application by adding classes, I remove the cached data by removing the file in tmp, which will be automatically made on the next hit.
    From Crad
  • A common pattern (Pear, Zend Framework as examples...) is to make the classname reflect the path, so Db_Adapter_Mysql will be in at /Db/Adapter/Mysql.php, from somewhere that's added to the include-path.

    JoshReedSchramm : An interesting thought and one i'm certinly not opposed to. It makes the classnames kind of long but that's ok in my book, if my folder structure makes sense so will my class names.
    From Greg
  • Hunting for files all over the place will make things slower (many more disk hits). Loading all of your classes in case you might need them will make things take more memory. Specifying which classes you need in every file is difficult to maintain (i.e. they don't get removed if they're no longer used).

    The real question is which of these is more important to you? They're all tradeoffs, in the end, so you have to pick one. It's arguable, though, that most of the overhead in the second and third options has to do with actually compiling the code. Using something like APC can significantly reduce the overhead of loading and compiling every class on every page load.

    Given the use of APC, I would likely take the approach of dividing up my code into modules (e.g. the web interface module, the database interaction module, etc.) and have each of those modules import all the classes for their module, plus classes from other modules they may need. It's a tradeoff between the last two, and I've found it works well enough for my needs.

    From Dan Udey
  • There are 2 ways that you could easily do this, first of all, name your classes so that they'll define the structure of where to find them

    function __autoload($classname)
    {
     try
     {
      if (class_exists($classname, false) OR interface_exists($classname, false))
      {
       return;
      }
    
      $class = split('_', strtolower(strval($classname)));
    
      if (array_shift($class) != 'majyk')
      {
       throw new Exception('Autoloader tried to load a class that does not belong to us ( ' . $classname . ' )');
      }
    
      switch (count($class))
      {
       case 1: // Core Class - matches Majyk_Foo - include /core/class_foo.php
        $file = MAJYK_DIR . 'core/class_' . $class[0] . '.php';
       break;
    
       case 2: // Subclass - matches Majyk_Foo_Bar - includes /foo/class_bar.php
        $file = MAJYK_DIR . $class[0] . '/class_' . $class[1] . '.php';
       break;
    
       default:
        throw new Exception('Unknown Class Name ( ' . $classname .' )');
        return false;
      }
    
      if (file_exists($file))
      {
       require_once($file);
    
       if (!class_exists($classname, false) AND !interface_exists($classname, false))
       {
        throw new Exception('Class cannot be found ( ' . $classname . ' )');
       }
      }
      else
      {
       throw new Exception('Class File Cannot be found ( ' . str_replace(MAJYK_DIR, '', $file) . ' )');
      }
    
     }
     catch (Exception $e)
     {
      // spl_autoload($classname);
      echo $e->getMessage();
     }
    
    }
    

    Or, 2, use multiple autoloaders. PHP >=5.1.2 Has the SPL library, which allows you to add multiple autoloaders. You add one for each path, and it'll find it on it's way through. Or just add them to the include path and use the default spl_autoload()

    An example

    function autoload_foo($classname)
    {
     require_once('foo/' . $classname . '.php');
    }
    
    function autoload_bar($classname)
    {
     require_once('bar/' . $classname . '.php');
    }
    
    spl_autoload_register('autoload_foo');
    spl_autoload_register('autoload_bar');
    spl_autoload_register('spl_autoload'); // Default SPL Autoloader
    
    From Mez
  • Autoload is great PHP feature that helps you very much... The perfomance wouldn't suffer if will use the smart taxonomy like: 1. every library stays in the folders "packages" 2. every class is located by replacing the "_" in the class name with the "/" and adding a ".php" at the end class = My_App_Smart_Object file = packages/My/App/Smart/Object.php

    The benefits of this approach(used by almost any framework) is also a smarter organization of your code :-)

    From andy.gurin
  • I tend to use a simple approach where __autoload() consults a hash mapping class names to relative paths, which is contained in a file that's regenerated using a simple script which itself performs the recursive search.

    This requires that the script be run when adding a new class file or restructuring the code base, but it also avoids "cleverness" in __autoload() which can lead to unnecessary stat() calls, and it has the advantage that I can easily move files around within my code base, knowing that all I need to do is run a single script to update the autoloader.

    The script itself recursively inspects my includes/ directory, and assumes that any PHP file not named in a short list of exclusions (the autoloader itself, plus some other standard files I tend to have) contains a class of the same name.

    From Rob
  • Zend Framework's approach is to do autoload based on the PEAR folder standard (Class_Foo maps to /Class/Foo.php), however rather than using a set base path it uses the include_path.

    The problem with their approach is there's no way to check beforehand if a file exists so the autoload will try to include a file that doesn't exist in any of the include_path's, error out, and never give any other autoload functions registered with spl_autoload_register a chance to include the file.

    So a slight deviation is to manually provide an array of base paths where the autoload can expect to find classes setup in the PEAR fashion and just loop over the base paths:

    <?php
    //...
    foreach( $paths as $path )
    {
        if( file_exists($path . $classNameToFilePath) )
            include $path . $classNameToFilePath;
    }
    //...
    ?>
    

    Granted you'll kinda be search but for each autoload you'll only be doing at worst n searches, where n is the number of base paths you are checking.

    But if you find yourself still having to recursively scan directories the question is not "Will autoload hurt my performance," the question should be "why am I tossing my class files around in a random structure?" Sticking to the PEAR structure will save you so many headaches, and even if you decide to go with manually doing your includes as opposed to autoload, there will be no guessing as to where the class files are located when you do your include statements.

    From dcousineau

0 comments:

Post a Comment