Batch Sizes

I have a couple of questions with regards to how Vamosa Server (or activeMQ) manages batches.

First of all, what is the general consensus on batch sizes? The default batch size is 25, is this generally what people leave it set to or would you modify the batch size in relation to the number of objects being operated on?

I've noticed some strange behaviour with batches when running against a project with 94K objects, when previously with just a few thousand objects there was no issue. First execution of a task seems to grind to a halt during the 2nd batch and just hangs there until we stop & re-start the server which kick starts the task. Apparently this is a common issue, is there a known reason for it?

I've also noticed that the query a task is set to run against returns far more objects than are generated in the batches and we have to run the same task again excluding the objects that have already been processed. Again, is this a know problem?


RE: Batch Sizes

Hi Adrian,

In general terms, the larger the number the batch size, the less batches required whereas the lower the batch size, the higher the number of commit statements called on the database. 

When changing the batch size there are two things that I would consider regarding the amount of metadata that is being set (update or insert) on each object:

1. This will have a direct affect on the amount of time that a batch will take to commit if there are a significant number of inserts and updates to metadata. In this case a smaller batch (25) would be fine.

2. If there is little metadata being added in the task, I would increase the batch size (255 is the max) as this will reduce the number of batches required and there will be little time lost in each commit.

From past experience I think a batch size of 75 works well.

Regarding the batch hanging during the execution of a task, can you check that you have the updated settings applied?

You can also monitor the number of batches that have been processed and are still to be processed by viewing the JConsole, this can be useful.

I have never seen the query returning more than the number of objects that are actually processed?

Can you check that the query you are using is the correctly defined query library (batchPrepare.queryLibrary) for the project and that it is the 'batchPrepare.query' property and not the 'import.query' property that you are checking against.

There may also be some condition in you script that excludes some objects from processing, could this be possible?

 


Hi Ross, Thanks for the

Hi Ross,

Thanks for the reply. Changing the batch size based on the amout of metadata being updated sounds reasonable. I will have to experiment with this.

Yes, we are using the properties you mentioned. Paul already pointed us to them and they were in fact what we already had set.

Steve O, showed me the JConsole which is indeed very useful.

We used the same query again with the same script without modification to either and the 2nd time through it picked up all the items it had missed the first time. I assume it is possible that having to kill and restart the server could result in missed items, but if that is the case I would only expect it to only miss a batch worth, e.g. 25 items, in our case it had missed about 20K from the 94K in the project.


Debug script

Hi Adrian,

I think there is most likely a a condition in the task script that you are running which causes these objects to be missed first time around.

If possible I would take one object that fails first time around and add some debug to the script to ensure that it goes in 'all the right places'. Then when you run it through the second time you should notice some different output in the server window.

By taking a small set of these objects that fail first time you should have a better indication of what has changed on the object that makes it process fully the second time.

I would also get a total for how many objects you expect from the query before you run the script then view the JConsole when you start the task, this should show the total number of batches and I would expect that number to match the total you have from your query output. If this is the case then you will be processing the number you expect and there is something in the script that changes the object to ensure it is fully processed second time around. 

Hope this helps,

Ross


Hi Ross, We did have a total

Hi Ross,

We did have a total number of items we were expecting to see processed. This number came from the same query we were using for the task. It was only after we did a calculation based on the number of batches and compared it to the number returned from the query that we noticed the discrepancy.

So it seems to me that the issue is definitly in the batch processing rather than the query used.

I don't particularly want to run anything over 94K objects again if I can avoid it but it's looking likely that we may have to do another crawl for this project so if we get a chance I'll try and keep an eye out for it happening again and make sure I keep track of any strangeness.


Strange?

Hi Adrian,

 

This is strange, never seen this before. It sounds as though there may be a limit to the number of batches that the VCM is able to handle, but I cannot confirm this.

I would try to increase your batch size to 255, this way there will be less batches generated.

It sounds as though further investigation is required though. Can you put any further information you may get regarding this issue onto groups, preferably here?

 

This way we will have more information to work with when we try to replicate the issue.

 

Ross


Re: Strange?

Hi,

To put my 2c worth in... if you could also confirm the numbers based on some criteria on the content ie a metadata field that should be set, rather than the number of batches.  Are you are getting the number of batches from JConsole?  If so I wouldn't use that to work out a total number of batches, it should only be used to determine number remaining as it may fire some through without registering them in the queue. 

Stewart.


Hi Stewart,   Yes we did

Hi Stewart,

 

Yes we did confirm after the script had run that there were still some items that were being returned by the query which hadn't been touched by the script. The script was initialising a metadata field on each item to 'Y' and only setting it to 'N' under certain conditions.

After the script had run we were expecting to see every item returned by the query have a value for this metadata field but this wasn't the case. It just so happened that we'd also been watching the number of batches in JConsole and they didn't seem to add up.

The query we were using is a little convoluted but it definitely returned more than was processed in the first run and running the script again with the same query a second time only picked up the records that had been missed first time around.

The task and the query were both being run at master project level, the query is as follows:

FROM com.vamosa.content.ContentDescriptor cd 
JOIN cd.metadataSet md
where cd.project.id=:projectId
and md.attribute='Identify Metadata.Content-Type'
and md.value like '%text/html%'
and cd.id not in

(select cdx.id FROM com.vamosa.content.ContentDescriptor cdx
JOIN cdx.metadataSet mdx
where cdx.project.id=:projectId
and mdx.attribute='vamosa.include'
and (mdx.value = 'N' or mdx.value = 'Y'))

 

p.s. how do i make something format as code?


Formatting as Code

To format as code, go to the Format drop-down box and select Preformatted.

 Like this (get your reading glasses ready!):

blah blah blah 


Working out batches and batch items

You can see exactly what gets processed in a given job by looking at the contents of the JOB, BATCH, and BATCHITEM tables. No need to use complicated stuff such as JConsole.

Regards,
Ijonas Kisselbach.
Chief Technology Officer.


Conjecture

Why would there be a limit the number of batches VCM is able to handle ? The relatively low number of objects (94k) would not support that statement.

Regards,
Ijonas Kisselbach.
Chief Technology Officer.


That is my question... why

That is my question... why would there be?

None of the other suggested possibilities seem to offer a valid reason for it, has nobody else ever encountered a similar scenario before?


A quick search on 'Groups

A quick search on 'Groups tells me this a newly encountered problem. I certainly haven't heard ofanything like it.

How about describing your environment...cuz you're not giving us much to work with so far...

Database ?

Where's the app server located ? What OS ?

Where's the client located ?

What's the networking between these components ?

 

Regards,
Ijonas Kisselbach.
Chief Technology Officer.


Not a new problem!

I've been following this post with great interest and I guess it's time for me to throw my hat in the ring.

This is the 3rd project I've been on this year where I've seen this issue of not-enough-batches-getting-created manifest itself. I don't recall the issue happening in pre-2.11 versions of VCM.

The first ingredient of the problem is a pipeline hanging after processing one or more batches.

Then I'll restart VCM and create a batchPrepare.query designed to run against only those contentdescriptors that haven't been processed by the script. The query is basically the same as the one Adrian has above (ie...

FROM com.vamosa.content.ContentDescriptor cd JOIN cd.metadataSet md where cd.project.id=:projectId and md.attribute='Identify Metadata.Content-Type' and md.value like '%text/html%' and cd.id not in (select cdx.id FROM com.vamosa.content.ContentDescriptor cdx 
JOIN cdx.metadataSet mdx
where cdx.project.id=:projectId
and mdx.attribute='Enhance.Success'
and mdx.value LIKE '%ScriptA%')

The basic assumption I make is that once the pipeline successfully finishes, this query should return 0 rows (I'm also making the assumption that the script didn't fail against any items). The assumption doesn't hold true, though, and this query will return thousands of rows. The only explanation is that many contentdescriptors didn't get batched. The good thing about this query is that you don't have to modify it; you just keep executing the pipeline until the result set of the query is 0.

As far as an environmental explanation, I don't see it. I've encountered the issue against Oracle, DB2 and SQL Server 2005 backends.

I always figured the issue had something to do with the batchPrepare.query, but really I don't have the foggiest idea.


Recreating the problem.

Thanks for allowing this conversation to go absolutely nowhere for 6 days before chiming in... Considering you're sitting next Adrian you could have saved him the trouble of asking whether or not this problem has been seen before.

So is this problem totally repeatable ? Can you provide a complete step-by-step guide to recreating the problem on a given database ?

Regards,
Ijonas Kisselbach.
Chief Technology Officer.


He knew that I had seen the

He knew that I had seen the issue before; he also knew that I had no idea why which I believe is the reason he started the thread.

For what it's worth - I think this thread has turned into a great forum discussion!

As for the repeatability of the problem - no, I don't know how to reproduce the issue on cue. It tends to happen when running the script across upwards of 1000 batches.

Given that we just crawled 94K objects on our current project, I expect the issue to re-surface at some point soon, and when it does, one of us will post the exact steps to reproduce.

For now, we can tell you the we're running VCM against a SQL Server 2005 backend, and that VCM and SQL server are running on different machines (same network obviously). Both boxes are Windows Server 2003 R2.


What's the precursor

Describe the first ingredient... a pipeline hanging after processing one or more batches. Is this event always the precursor to the batch size problem ? i.e. if the pipeline hangs and you restart you will always get the batchsize problem ? However causing the pipeline to hang is not easily repeatable and hence makes this problem difficult to diagnose/recreate ?

 

Regards,
Ijonas Kisselbach.
Chief Technology Officer.


I don't know if the pipeline

I don't know if the pipeline hanging is always the pre-cursor - but it's been the most common pre-cursor when I've seen the issue in the past. I always thought the batchsize problem had something to do with using a convoluted, nested batchPrepare.query like the one shown above - and the most common circumstance I'd use a query like that is after a pipeline hung. But my confidence level on this diagnosis is not high.

I'm well-aware this isn't the most helpful explanation of the problem - it makes it difficult for others to diagnose when I myself don't even the steps to reproduce crystalized in my head.

If and when this happens again across our 94K project, Adrian and I will post far more detailed re-creation steps :)


Reproduced the issue!

Adrian and I have reproduced the issue described above. Here are the exact steps to reproduce:

 1) Executed a pipeline containing 1 Per-object script against a the straightforward batchPrepare.query of "find all html content" that returned 60.452 rows. The batchSize of the project is 25.

2) We checked the Job and Batch tables with the following query:

select b.*
from job j,
batch b
where j.id = '8a70c2531afdcfe5011afe9af6750003' and
j.id = b.jobid
order by b.state

The query returned 2419 rows, which is exactly what we expect.

3) We checked the Queue size in JConsole: It was 890, which is less than we expected

4) The pipeline hung at the very end of the first batch, but before the second batch started. We were later able to determine that (1) this batch never took on a state of 'FINISHED', but (2) there were 0 rows remaining in the batchitem table with this batchid.

5) We waited 15 minutes before closing the VCM Client and Server.

6) We restarted the Vamosa Server and the script resumed straight away - presumably with the 2nd batch  - and then we started the Client.

7) The pipeline ran without issue until the Queue size reported by JConsole reached 0, at which time point the Pipeline hung before committing this batch. The above query returned 1529 rows where the batch.state field had a value of 'QUEUED' (and 890 rows where the batch.state field had a value of 'FINISHED'.

8) We waited 90 minutes and then restarted Vamosa. No activity resumed, so we opened the Client.

 

 

That's where we are now. The next thing we're going to do is change the batchPrepare.query to:

FROM com.vamosa.content.ContentDescriptor cd JOIN cd.metadataSet md where cd.project.id=:projectId and md.attribute='Identify Metadata.Content-Type' and md.value like '%text/html%' and cd.id not in (select cdx.id FROM com.vamosa.content.ContentDescriptor cdx 
JOIN cdx.metadataSet mdx
where cdx.project.id=:projectId
and mdx.attribute='Enhance.Success'
and mdx.value LIKE '%Standardize Pages%')

... and restart the pipeline.


Furter Info

To follow up on this... after restarting the pipeline with the above query, which returns 38177 items, we watched the number of queued items in JConsole increase as the first batch was being processed. The number of queued items in JConsole got to 1386 at which point the server ground to a halt just after the last item of the 2nd batch where we had to stop and start the server again to kick start the processing of the job. Running a quick query on the database again showed that this batch was still showing as QUEUED but there were no longer any items listed for it in the batchitem table.

Eventually this job again ground to a halt after the queued items in JConole got to 0.

This leaves us with 152 QUEUED batches in the batch table for this job.

Stopping and starting the server didn't kick start anything this time so we've reset the the project state and started the job again against the same query to capture any un-processed objects. This time JConsole correctly reported 152 queued batches and the server is happily processing without grinding to a halt at the early stages of the queued batches.


Thanks guys!!!

I shall be scratching my head in honour of you... I have same ideas...

Will get back to you

 

Regards,
Ijonas Kisselbach.
Chief Technology Officer.


FYI...  We've just kicked

FYI...

 We've just kicked off another per object script on this same set of data.

After reading through the post about Settings For ActiveMQ I figured it would be worth modifying the value for the beans/memoryManager/usageManager:limit setting.

The previous value we had set was 400 MB, as per the recommended setting in that post for a server with 2GB of memory. The server we're running on has 8GB of memory. I changed the setting to 800 MB and all seemed fine. So when kicking off the script this time it got as far as the 2nd item of the 3rd batch before coming to a halt, by this time it had generated 1775 batches in JConsole. Stopping and starting the server kickstarted the processing again and I expect it to run to the end of the 1775 batches which will account for approx 75% of the items returned by the batch prepare query.

 

Is there a known limit to the max value for this setting? When experimenting on my laptop I was able to set it to pretty much any value, I went as high as 4000 MB (twice the size of the memory in my laptop) without any obvious side effects but I don't have a project with a huge amount of data on my laptop to test it against.

 

FWIW... our JAVA_OPTS settings are:

 JAVA_OPTS=%JAVA_OPTS% -Xms512m -Xmx1024m -XX:MaxPermSize=512m

 I tried experimenting with the Xmx value but Vamosa Server didn't seem to like going beyond 1024 giving the following error:

Error occurred during initialization of VM

Could not reserve enough space for object heap

Could not create the Java virtual machine.

 


Further Info

To follow up on this...

The script eventually finished this morning and as expected it completed the 1775 batches that had been generated without any further issues. This left us with 16K+ items still unprocessed, so we ran this script again using a query similar to the previous one mentioned to grab the remaining unprocessed items. This time it managed to generate enough batches straight off the bat and didn't require a stop/start of the server to complete. So the change to the memory manager limit seems to have at least improved the situation enough to enable us to complete the task with just 2 runs through the script as opposed to the the 3 or 4 runs it was taking previously. This second run has also now completed without any further issues and all items have now been processed.

While investigating I also came across this post from Ross back in November 2007, which I hadn't seen  before, that describes a similar situation except with an Oracle db where the server stops processing part way into the batches - http://groups.vamosa.com/node/2302 -  interestingly he also decided to up the value for memory manager limit to 800 and saw an improvement in performance.