NavigationUser login |
Disk Monitor Crawl PatternsI am using Disk Monitor to crawl a file system for a document migration project. I currently use ".*" as the crawl pattern, and this obviously returns all files, but this needs to be changed to just return specific files by extension to cut down on the number of files crawled. I've tried "*.doc", "*.pdf" etc, but these crawl patterns return 0 files, there are definitely ".doc" and ".pdf" files in the path, so the question is does anyone know what the syntax is for the crawl patterns in Disk Monitor. Cheers Andy It may be a bit more complicated...Adrian,
thanks for the quick reply, unfortunately standard Regular Expressions don't seem to work either, Ross and I have tried the above but still returns 0 file. Paul thinks it may be something to do with putting the root folder in too, but we'll probably need to wait for Stew to get back for the full answer. Cheers Andy Re: It may be a bit more complicated...Hi, I'm not really here, and you didn't see me right? :) Disk Monitor uses the same logic as the Migrator's Web Crawl for patterns. It does use regular expressions, and uses the .net IsMatch() method to check whether it matches that pattern or not. From the documentation it would appear that if it makes a match anywhere in the path (including filename), it will match that pattern. If you are getting 0 returned, i would suggest changing your logging level out to DEBUG as this will then tell you what rule it is being excluded on or whether it hasn't matched any rule at all. Stewart. Who What Where...Hi Stew, when I use ".*\.doc" as the crawl pattern, I see the message that the crawl has started and then something like Excluding \\fileserver\path1\path2\path3\path4 due to no include matched And then obviously no files returned. Cheers Andy Log file pleaseCan you please post the exact log...? Regards, No File TypesAfter some investigation with Disk Monitor crawl patters I have concluded the following:
So we could have multiple exclude patterns for each file type we do not want, capturing only the file types we do want, but we cannot create a list for only the files we want, unless we have an absolute include for each folder we are going to crawl - which is unlikely that we will know this before a crawl.
If I have this correct, perhaps we should have crawl patterns that include/exclude folders and another include/exclude list for files. This way we could include all folders and exclude specific file types.
Do I have this logic correct? Not quiet correct.The Disk Monitor crawl patterns limit folder selection, and completely ignore file selection. I will raise this as an enhancement request for v2.12 (end of July). Regards, |
Hi Andy, I've not used Disk
Hi Andy,
I've not used Disk Monitor but if it's using Regular Expressions I would expect ".*\.doc" to match any .doc extensions and something like ".*\.((doc)|(pdf))" to match either .pdf or .doc
Cheers!
Adrian.