Saturday, January 15, 2011

Webbased Log Parser for mod_log_sql

Our frontend webservers are now logging all our weblogs into mySQL using mod_log_sql, freeing up thousands of "AccessLog" directives in our apache config (we're running between 600-900 virtual hosts on our servers now)

That being said, I'm trying to find a reasonable weblog analyzer that works with mod_log_sql. I've used webalizer & awstats for years and i really like them, however neither tool supports sql-based logging.

It doesn't have to be live-time, but it does atleast have to be able to grab data from a database table.

Anyone have any suggestions?

  • There is a php script called Skeith that does what you want.

    Go here to download http://skeith.sourceforge.net/

    Here is a snip from the site:

    Skeith is a simple log analyzer and reporter. Specifically, Skeith works for the mod_log_sql module for Apache (it should work for mod_log_mysql too, but thus far testing has only been done with mod_log_sql).

    Skeith's main feature that sets it apart from other log analyzers it that it can generate the log file for a given day or month on-the-fly. This way the sysadmin can look at the exact requests that may be questionable or harmful.

    From Freddy
  • I would not recommend storing logs in any kind of SQL database. SQL storage engines are simply not fit for that, as the amount of data increases (as it surely will with near 1000 virtual hosts), the write speed will suffer from severe slowness. Deletion from the database is also a painful operation, as the table will get fragmented, further increasing read/write latency and decrease speed.

    It you insist storing logs into SQL database, You will have to do your best filtering out as much unimportant data as much as you can.

    LapTop006 : Actually even fairly basic machines (eg, laptops) can do ~100k lines a minute logging to MySQL. Yes it adds overhead, but really not that much.
    : agree with the above comment: We use postgreSQL to log our web logs and we have two PostgreSQL servers using RAID10 to handle apache logs, among other databases, to log for over 900 web sites across 6 servers. If/When we run up against a database performance issue, there are caching tools available such as PGPOOL and MEMCACHE for databases. Although, asadmin's point is valid in that it has the potential to be a bottleneck if not monitored and/or setup correctly.
    asdmin : LapTop006: of course raw writing a database is possible with incredible speed (as long as indexes are not used and data are not deleted), but as soon as data gets deleted (fragmented tables) or indexes introduced (inserting nightmare), or a fairly simple "like" string comparison occurs (remember, raw text), the database turns out to be useless. Of course, archiving from SQL tables is a pain, as no one would like to let an SQL database grow forever. Archiving also means deletion, so fragmented tables are back in business...
    From asdmin

0 comments:

Post a Comment