Disk Monitor Crawl Patterns

I am using Disk Monitor to crawl a file system for a document migration project.

I currently use ".*" as the crawl pattern, and this obviously returns all files, but this needs to be changed to just return specific files by extension to cut down on the number of files crawled.

I've tried "*.doc", "*.pdf" etc, but these crawl patterns return 0 files, there are definitely ".doc" and ".pdf" files in the path, so the question is does anyone know what the syntax is for the crawl patterns in Disk Monitor.

Cheers

Andy


Hi Andy, I've not used Disk

Hi Andy,

I've not used Disk Monitor but if it's using Regular Expressions I would expect ".*\.doc" to match any .doc extensions and something like ".*\.((doc)|(pdf))" to match either .pdf or .doc

Cheers!
Adrian.


It may be a bit more complicated...

Adrian,

 

thanks for the quick reply, unfortunately standard Regular Expressions don't seem to work either, Ross and I have tried the above but still returns 0 file. Paul thinks it may be something to do with putting the root folder in too, but we'll probably need to wait for Stew to get back for the full answer.

Cheers

Andy


Re: It may be a bit more complicated...

Hi,

I'm not really here, and you didn't see me right? :)

Disk Monitor uses the same logic as the Migrator's Web Crawl for patterns.  It does use regular expressions, and uses the .net IsMatch() method to check whether it matches that pattern or not.  From the documentation it would appear that if it makes a match anywhere in the path (including filename), it will match that pattern.  If you are getting 0 returned, i would suggest changing your logging level out to DEBUG as this will then tell you what rule it is being excluded on or whether it hasn't matched any rule at all.

Stewart.


Who What Where...

Hi Stew,

when I use ".*\.doc" as the crawl pattern, I see the message that the crawl has started and then something like

Excluding \\fileserver\path1\path2\path3\path4 due to no include matched

 And then obviously no files returned.

Cheers

Andy


Log file please

Can you please post the exact log...?

Regards,
Ijonas Kisselbach.
Chief Technology Officer.


Exact Log

Ijonas, 

The exact message in the event log is

Excluding \\fileserver01\Vamosa Projects\External\PQRS\Symbian due to no include matched

I've emailed you a screen shot of the message.

Cheers

Andy


No File Types

After some investigation with Disk Monitor crawl patters I have concluded the following:

 

Exclude patterns will take precedence over include patterns.

 

We can exclude both files and folders by using a crawl pattern, for example: ‘.*\.pdf’ will exclude all PDF files from our crawl. However we cannot include only PDF files by setting this pattern as an include, if we attempt this we will not have a pattern that traverses each of the sub folders - giving 0 objects as the result.

So we could have multiple exclude patterns for each file type we do not want, capturing only the file types we do want, but we cannot create a list for only the files we want, unless we have an absolute include for each folder we are going to crawl - which is unlikely that we will know this before a crawl.

 

If I have this correct, perhaps we should have crawl patterns that include/exclude folders and another include/exclude list for files. This way we could include all folders and exclude specific file types.

 

Do I have this logic correct?


Not quiet correct.

The Disk Monitor crawl patterns limit folder selection, and completely ignore file selection. I will raise this as an enhancement request for v2.12 (end of July).

Regards,
Ijonas Kisselbach.
Chief Technology Officer.