I have a table with roughly 100.000 blog postings, linked to a table with 50 feeds via an 1:n relationship. When I query both tables with a select statement, ordered by a datetime field of the postings table, MySQL always uses filesort, resulting in very slow query times (>1 second). Here's the schema of the postings
table (simplified):
+---------------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| feed_id | int(11) | NO | MUL | NULL | |
| crawl_date | datetime | NO | | NULL | |
| is_active | tinyint(1) | NO | MUL | 0 | |
| link | varchar(255) | NO | MUL | NULL | |
| author | varchar(255) | NO | | NULL | |
| title | varchar(255) | NO | | NULL | |
| excerpt | text | NO | | NULL | |
| long_excerpt | text | NO | | NULL | |
| user_offtopic_count | int(11) | NO | MUL | 0 | |
+---------------------+--------------+------+-----+---------+----------------+
And here's the feed
table:
+-------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| type | int(11) | NO | MUL | 0 | |
| title | varchar(255) | NO | | NULL | |
| website | varchar(255) | NO | | NULL | |
| url | varchar(255) | NO | | NULL | |
+-------------+--------------+------+-----+---------+----------------+
And here's the query that takes >1 second to execute. Please note that the post_date
field has an index, but MySQL isn't using it to sort the postings table:
SELECT
`postings`.`id`,
UNIX_TIMESTAMP(postings.post_date) as post_date,
`postings`.`link`,
`postings`.`title`,
`postings`.`author`,
`postings`.`excerpt`,
`postings`.`long_excerpt`,
`feeds`.`title` AS feed_title,
`feeds`.`website` AS feed_website
FROM
(`postings`)
JOIN
`feeds`
ON
`feeds`.`id` = `postings`.`feed_id`
WHERE
`feeds`.`type` = 1 AND
`postings`.`user_offtopic_count` < 10 AND
`postings`.`is_active` = 1
ORDER BY
`postings`.`post_date` desc
LIMIT
15
The result of the explain extended
command on this query shows that MySQL is using filesort:
+----+-------------+----------+--------+---------------------------------------+-----------+---------+--------------------------+-------+-----------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+----------+--------+---------------------------------------+-----------+---------+--------------------------+-------+-----------------------------+
| 1 | SIMPLE | postings | ref | feed_id,is_active,user_offtopic_count | is_active | 1 | const | 30996 | Using where; Using filesort |
| 1 | SIMPLE | feeds | eq_ref | PRIMARY,type | PRIMARY | 4 | feedian.postings.feed_id | 1 | Using where |
+----+-------------+----------+--------+---------------------------------------+-----------+---------+--------------------------+-------+-----------------------------+
When I remove the order by
part, MySQL stops using filesort. Please let me know if you have any ideas on how to optimize this query to get MySQL to sort and select the data by using indexes. I have already tried a few things such as creating a combined index on all where/order by fields, as suggested by a few blog postings, but this didn't work either.
-
MySQL has two filesort algorithms: an older filesort that sorts records on disk, and a new version that works in memory.
If it cannot use an index on the first table in the join to sort the query, it will have to do a filesort. If resultset before sorting converted to fixed-width format is greater than the sort buffer OR if it contains any text fields, it will have to use the slower on-disk filesort algorithm (the second condition is satisfied since your query has a text field).
MySQL is choosing to use the is_active column, ostensibly because it thinks that column is most selective in eliminating rows before it continues with the other joins and where conditions. The first thing I would suggest would be to try creating composite indexes with post_date, feed_id, and the columns in the where condition, e.g. (is_active, user_offtopic_count, post_date, feed_id).
Dennis G. : Thanks for the explanation! -
Create a composite index either on
postings (is_active, post_date)
(in that order).It will be used both for filtering on
is_active
and ordering bypost_date
.MySQL
should showREF
access method over this index inEXPLAIN EXTENDED
.Note that you have a
RANGE
filtering condition overuser_offtopic_count
, that's why you cannot use an index over this field both in filtering and in sorting by other field.Depending on how selective is your
user_offtopic_count
(i. e. how many rows satisfyuser_offtopic_count < 10
), it may be more useful to create an index onuser_offtopic_count
and let the post_dates be sorted.To do this, create a composite index on
postings (is_active, user_offtopic_count)
and make sure theRANGE
access method over this index is used.Which index will be faster depends on your data distribuion. Create both indexes,
FORCE
them and see which is faster:CREATE INDEX ix_active_offtopic ON postings (is_active, user_offtopic_count); CREATE INDEX ix_active_date ON postings (is_active, post_date); SELECT `postings`.`id`, UNIX_TIMESTAMP(postings.post_date) as post_date, `postings`.`link`, `postings`.`title`, `postings`.`author`, `postings`.`excerpt`, `postings`.`long_excerpt`, `feeds`.`title` AS feed_title, `feeds`.`website` AS feed_website FROM `postings` FORCE INDEX (ix_active_offtopic) JOIN `feeds` ON `feeds`.`id` = `postings`.`feed_id` WHERE `feeds`.`type` = 1 AND `postings`.`user_offtopic_count` < 10 AND `postings`.`is_active` = 1 ORDER BY `postings`.`post_date` desc LIMIT 15 /* This should show RANGE access with few rows and keep the FILESORT */ SELECT `postings`.`id`, UNIX_TIMESTAMP(postings.post_date) as post_date, `postings`.`link`, `postings`.`title`, `postings`.`author`, `postings`.`excerpt`, `postings`.`long_excerpt`, `feeds`.`title` AS feed_title, `feeds`.`website` AS feed_website FROM `postings` FORCE INDEX (ix_active_date) JOIN `feeds` ON `feeds`.`id` = `postings`.`feed_id` WHERE `feeds`.`type` = 1 AND `postings`.`user_offtopic_count` < 10 AND `postings`.`is_active` = 1 ORDER BY `postings`.`post_date` desc LIMIT 15 /* This should show REF access with lots of rows and no FILESORT */
Dennis G. : That did the trick for me, thanks a lot! I had to use force index to get the best index used. We are now using multiple combined indexes for the different queries. -
Also, it's important to remember that MySQL won't use an index if the column you're ordering by has a function applied to it.
You should also try aliasing postings.post_date as something else. This will tell MySQL to order by the unaltered column, and you'll still select the unix timestamp.
0 comments:
Post a Comment