Hi All,
I have a MySQL indexing question for you guys.
I've got a very large table (~100Million Records) in MySQL that contains information about files. Most of the Queries I do on it involve substring operations on the file path column.
Here's the table ddl:
CREATE TABLE `filesystem_data`.`$tablename` (
`file_id` INT( 14 ) NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`file_name` VARCHAR( 256 ) NOT NULL ,
`file_share_name` VARCHAR ( 100 ) NOT NULL,
`file_path` VARCHAR( 900 ) NOT NULL ,
`file_size` BIGINT( 14 ) NOT NULL ,
`file_tier` TINYINT(1) UNSIGNED NULL,
`file_last_access` DATETIME NOT NULL ,
`file_last_change` DATETIME NOT NULL ,
`file_creation` DATETIME NOT NULL ,
`file_extension` VARCHAR( 50 ) NULL ,
INDEX ( `file_path`, `file_share_name` )
) ENGINE = MYISAM
};
So for example ill have a row with a file_path like:
'\\Server100\share2\Home\Zenshai\My Documents\'
And I'll extract the User's name (Zenshai in this example) with something like
SELECT substring_index(substring_index(fp.file_path,'\\',6),'\\',-1) as Username
FROM (SELECT '\\\\Server100\\share2\\Home\\Zenshai\\My Documents\\' as file_path) fp
It gets a bit ugly, but that's not really my concern right now.
What I'd like some advice on is what kind of index (if any at all) can help speed up these types of queries on this table. Any other suggestions are welcome too.
Thanks.
PS. Although the table gets very large there is enough space for indexes.
-
You cannot use indices with your current table design.
You may add a column called
USERNAME, fill it in theINSERT/UPDATEtrigger with the expression you use inSELECT, and search on this column.P. S. Just curious, you really have
100 mln+files on your server?Zenshai : It's not just one server, and its not 'mine' at all. But yes, a lot more actually.Zenshai : Also, Thank you for you answer. I will try that, it will probably be worth the extra time spent inserting to have faster queries. -
I'd create a tiny (columns, not record count) subtable that would have the file path broken out and stored like so:
FK_TO_PARENT PATH_PART 1 Server100 1 share2 1 Home 1 Zenshai 1 My DocumentsAnd then just index PATH_PART. Of course if the parent table is 100 Million plus, then this would be going into the billions of records.
0 comments:
Post a Comment