Hi,
I have been trying to crawl content, process that content and then load it back into a CMS. Come across a problem when executing a Asset Load sub project task pipeline - wondering if anyone can shed any light on what the problem might be..?
I have 3 Assets in the system after crawling, feature extracting and classifying and then running an Import Content sub task pipeline (clear project -> import content -> GCMS Metadata -> Extract Metadata). When querying the content, all 3 assets have GenericCMS.Filename set, but also all say Classifier.Classified = false. I can view the content of each asset and it looks fine. Example metadata snippets for 1 of these assets:
Classifier.Classified = false
Enhance.Success = Feature Extract, Classify, Classify, GCMS Metadata, Extract Metadata
HttpEquiv.Content-Type = text/html; charset=utf-8
When i run the Asset Load pipeline following running Import Content, i see errors in the jboss log:
12:54:06,983 INFO [JobManagerServiceImpl] Executing taskpipeline[Import Content] against project[Assets] with 4 tasks.
12:54:07,374 INFO [JobManagerServiceImpl] Adding taskDef[Import Content] to job.
12:54:07,374 INFO [JobManagerServiceImpl] Adding taskDef[GCMS Metadata] to job.
12:54:07,390 INFO [JobManagerServiceImpl] Adding taskDef[Extract Metadata] to job.
12:54:07,936 INFO [ProcessEngineManagerImpl] Generating task[com.vamosa.process.Task@14811b6,Clear Project] against project[410
c011d6dab66440049,Assets]
12:54:07,983 INFO [Job] Clearing content and dependent data from project[Assets].
12:54:08,015 INFO [Job] [1/7] Deleting Batch Items ...
12:54:08,155 INFO [Job] [2/7] Deleting Outbound Links ...
12:54:08,311 INFO [Job] [3/7] Deleting Metadata ...
12:54:08,405 INFO [Job] [4/7] Deleting Content ...
12:54:12,858 INFO [Job] [5/7] Deleting ContentDescriptors ...
12:54:12,890 INFO [Job] [6/7] Deleting Redirection Entries ...
12:54:12,936 INFO [Job] [7/7] Deleting CrawlStates ...
12:54:12,936 INFO [Job] Synchronizing database ...
12:54:12,968 INFO [ProcessEngineManagerImpl] Successfully executed Clear Project:
12:54:12,968 INFO [ProcessEngineManagerImpl] Generating task[com.vamosa.process.Task@1961ad6,Import Content] against project[41
5c011d6dab66440049,Assets]
12:54:12,983 INFO [Job] Importing content into project[Assets]
12:54:13,296 INFO [AsyncImporterMDPojo] Importing content from batch [4103e2111d76f40b011d770459c2000d] into project [Assets]
12:54:13,327 INFO [AsyncImporterMDPojo] Importing content descriptor [-3051926395955019591,http://win2kt1.magus.co.uk/drupal-6.
and related content into project [Assets].
12:54:13,593 INFO [AsyncImporterMDPojo] Importing content descriptor [-3051926395955019648,http://win2kt1.magus.co.uk/drupal-6.
ted content into project [Assets].
12:54:13,655 INFO [AsyncImporterMDPojo] Importing content descriptor [-3051926395955019751,http://win2kt1.magus.co.uk/drupal-6.
and related content into project [Assets].
12:54:13,796 INFO [ProcessEngineManagerImpl] Generating task[com.vamosa.process.Task@976386,Batch Preparation] against project[
035c011d6dab66440049,Assets]
12:54:13,890 INFO [ProcessEngineManagerImpl] Successfully executed Batch Preparation: Batch preparation succesful.
12:54:13,983 INFO [ProcessEngineManagerImpl] Executing {2} tasks against batch[4103e2111d76f40b011d77045c530011]
12:54:13,999 INFO [ProcessEngineManagerImpl] Applying {2} tasks against {3} batch items.
12:54:14,030 INFO [ProcessEngineManagerImpl] Invoking task on 3 items using 1 threads.
12:54:14,796 INFO [STDOUT] *sys-package-mgr*: can't create package cache dir, 'C:\vamosa\jboss\server\default\tmp\deploy\tmp555
ing-exp.war\WEB-INF\lib\jython.jar\cachedir\packages'
12:54:16,046 INFO [Job] Enhancing content descriptor[-1067788941104873245,http://win2kt1.magus.co.uk/drupal-6.5/] with enhance
tadata]
12:54:16,124 INFO [Job] Enhancing content descriptor[-1067788941104873441,http://win2kt1.magus.co.uk/drupal-6.5/?q=node/2] with
k[GCMS Metadata]
12:54:16,140 INFO [Job] Enhancing content descriptor[-1067788941104873344,http://win2kt1.magus.co.uk/drupal-6.5/?q=node/1] with
k[GCMS Metadata]
12:54:16,155 INFO [ProcessEngineManagerImpl] Invoking task on 3 items using 1 threads.
12:54:16,186 INFO [Job] Enhancing content descriptor[-1067788941104873245,http://win2kt1.magus.co.uk/drupal-6.5/] with enhance
Metadata]
12:54:16,249 INFO [Job] Enhancing content descriptor[-1067788941104873441,http://win2kt1.magus.co.uk/drupal-6.5/?q=node/2] with
k[Extract Metadata]
12:54:16,280 INFO [Job] Enhancing content descriptor[-1067788941104873344,http://win2kt1.magus.co.uk/drupal-6.5/?q=node/1] with
k[Extract Metadata]
12:54:16,358 INFO [ProcessEngineManagerImpl] --- Pipeline finished ---
12:54:16,374 INFO [ProcessEngineManagerImpl] --- Processing time: 0:0:8.984
12:54:28,827 INFO [JobManagerServiceImpl] Executing taskpipeline[Asset Load] against project[Assets] with 1 tasks.
12:54:29,077 INFO [ProcessEngineManagerImpl] Generating task[com.vamosa.process.Task@d4ae94,Batch Preparation] against project[
035c011d6dab66440049,Assets]
12:54:29,108 INFO [ProcessEngineManagerImpl] Successfully executed Batch Preparation: Batch preparation succesful.
12:54:29,140 INFO [ProcessEngineManagerImpl] Executing {1} tasks against batch[4103e2111d76f40b011d770497e50023]
12:54:29,155 INFO [ProcessEngineManagerImpl] Applying {1} tasks against {3} batch items.
12:54:29,171 INFO [ProcessEngineManagerImpl] Invoking task on 3 items using 1 threads.
12:54:29,233 INFO [Job] Loading asset with content from http://win2kt1.magus.co.uk/drupal-6.5/?q=node/1
12:54:29,249 INFO [STDOUT] Bad Base64 input character at 0: 60(decimal)
12:54:29,265 ERROR [ProcessEngineManagerImpl] Unable to execute task. java.lang.NullPointerException
java.util.concurrent.ExecutionException: java.lang.NullPointerException
at com.vamosa.process.SynchronousFuture.get(SynchronousFuture.java:29)
at com.vamosa.process.ProcessEngineManagerImpl.executeTaskAgainstBatch(ProcessEngineManagerImpl.java:565)
at com.vamosa.process.AsyncJobHandlerMDPojo.A(AsyncJobHandlerMDPojo.java:78)
at com.vamosa.process.AsyncJobHandlerMDPojo.onMessage(AsyncJobHandlerMDPojo.java:56)
at sun.reflect.GeneratedMethodAccessor324.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:281)
at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:187)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:154)
at org.springframework.transaction.interceptor.TransactionInterceptor.invoke(TransactionInterceptor.java:107)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:176)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:210)
at $Proxy60.onMessage(Unknown Source)
at org.springframework.jms.listener.AbstractMessageListenerContainer.doInvokeListener(AbstractMessageListenerContainer.j
at org.springframework.jms.listener.AbstractMessageListenerContainer.invokeListener(AbstractMessageListenerContainer.jav
at org.springframework.jms.listener.AbstractMessageListenerContainer.doExecuteListener(AbstractMessageListenerContainer.
at org.springframework.jms.listener.DefaultMessageListenerContainer.doReceiveAndExecute(DefaultMessageListenerContainer.
at org.springframework.jms.listener.DefaultMessageListenerContainer.receiveAndExecute(DefaultMessageListenerContainer.ja
at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.invokeListener(DefaultMe
rContainer.java:904)
at org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListen
java:857)
at org.springframework.core.task.SimpleAsyncTaskExecutor$ConcurrencyThrottlingRunnable.run(SimpleAsyncTaskExecutor.java:
at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.NullPointerException
at com.vamosa.tasks.load.GCMSConnector.doLoadAsset(GCMSConnector.java:144)
at com.vamosa.tasks.load.BaseCMSConnector.loadAsset(BaseCMSConnector.java:92)
at com.vamosa.tasks.AssetLoad.execute_inner(AssetLoad.java:58)
at com.vamosa.tasks.AbstractTask.execute(AbstractTask.java:345)
at com.vamosa.tasks.AbstractPerObjectTask.call(AbstractPerObjectTask.java:234)
at com.vamosa.tasks.AbstractPerObjectTask.call(AbstractPerObjectTask.java:25)
at com.vamosa.process.SynchronousFuture.get(SynchronousFuture.java:27)
... 22 more
I am seeing the first xml file generated in the output directory (out of what should be 3 files generated i presume) - this file has no content (0 KB) and no other files are generated..
Any pointers appreciated on how to get the content processed correctly and loaded back into a CMS..
Thanks
Assets are binary
Hi,
It sounds as though you are confusing text/html objects with assets, which are binary files such as images, PDFs etc. when you are dealing with these types of file the sub project type should be PLACEHOLDER or CONTENT and not ASSETS.
This could be the problem as the connector that you are using will be attempting to create an asset using binary data (connectors are generally executed based on the sub project type), which will not exist in your text/html files, so could result in an exception when attempting to update the file with the content.
Can you attempt the load using the correct sub project type for text/html?
Regards,
Ross