Batch Processing
Batch Processing with NetKernel
Index > Books > Declarative Development Guide > Trailmaps > Batch Processing

Rate this page:
Really useful
Satisfactory
Not helpful
Confusing
Incorrect
Unsure
Extra comments:


Introduction

It is frequently useful to be able to batch process many documents at a time. This example describes a useful pattern to perform batch processing. It can be adapted to do more sophisticated processing such as HTML to XHTML batch conversion of a website.

Specification

This example batch process will recursively find all system.xml files in a given directory. For each file found we will count the number of elements it contains. The results will be presented in an HTML table - this assumes that this process will be started by a web-browser request - it could equally be initiated through the command-line transport, email or any other available transport.

Overview

Below is a dpml batch process. It contains detailed comments which will allow you to adapt it to your needs. Here we'll summarize the 3 stages and suggest ways that this pattern can be modified.

Prepare resource list

Any batch process requires a list of resources to process. In the example we use the fls accessor to provide a document which lists the system.xml files below a root directory. You can change the root and filter to suit your system. Alternatively you could supply a hand crafted source document with the URI's of the files you want to process as elements. Note since NetKernel provides a URI resolver infrastructure you can source resources from anywhere not just the local filesystem so, for example, you could use this pattern as the basis for a web-bot...

Iterate over resource list

The example process iterates over the resource list generated by fls. Each URI element in the source document is used as the target resource in the inner batch process. In our example we perform an xquery operation to count the elements. We've used an XML process by way of example, you could do anything at all here, your resource could even be non-XML. A useful example is to use the XHTMLTidy to batch convert html files to valid XHTML files.

In our example we are not changing the target resource. You could as easily write the results of your process back to the target resource URI. Though take care since this is permanent and unrecoverable. It's usually a good idea to first execute a test process which simply logs the result to make sure your process is working as your expected. Once everything works you can add the URI writeback.

Show results

In our example we accummulate the results of each process in a variable. The result could be used for more extensive reporting including any exceptions that might have occured. To keep our example simple we've only provided basic exception processing. Finally the results are styled and presented as an HTML table. You could write them to a file or use them as the start of another process...

Executing this process

To execute this dpml process we need to create a host module.

  1. Use the new module wizard to create and install a new module - choose the default settings ensuring your module supports dpml. Make sure you choose to import the module into the Front end fulcrum - this will make it available on localhost port 8080 by default. Your new module will be located in <install>/modules/your_module_name/.
  2. The example process uses the fls accessor supplied from the ext_sys module and the xquery accessor supplied from ext_xquery. These modules must be imported into your module by adding the following two imports into the mapping section of your module.xml definition located in the root directory of your module.
    <mapping> ...Existing Imports...
      <import>
        <uri>urn:org:ten60:netkernel:ext:sys</uri>
      </import>
      <import>
        <uri>urn:org:ten60:netkernel:ext:xquery</uri>
      </import>
    </mapping>
    . You must now do a cold restart to pick up the module changes.
  3. Finally copy the batch process listing below to a file batch.idoc in the resources/ directory of your module. You should edit the fls instruction to point to a different root directory and, if you wish, change the regex filter to match different file names.
  4. You can start the process by requesting the URI with a web-browser http://localhost:8080/batch.idoc.

Deadlock Detector Exception

Searching the filesystem can take a long time, depending on how deep your filesystem tree. You may encounter a NetKernel Deadlock Detector exception if the fls search takes a very long time. This is thrown because the Kernel monitors all scheduled request and if no response is received after a set interval the Kernel kills the request and issues an Error - in a web-application this is very valuable but can be unhelpful for batch processing! You can increase the deadlock detection period here.

<idoc>  <seq>
    <comment> ****************************************** A Batch Processing Pattern. This example finds all system.xml documents and counts the number of elements they contain. You can adapt it to suit your needs. ****************************************** </comment>
    <comment> *********** Use File LS accessor to list files. o Modify the root for your filesystem o Modify the filter regex to target other XML files The result is a tree of matching resources each with a uri element containing the URI of the resource. We'll use this as the source for the batch process. *********** </comment>
    <instr>
      <type>fls</type>
      <operator>
        <fls>
          <root>file:///home/pjr/dev/</root>
          <filter>.*system.xml</filter>
          <recursive />
          <uri />
        </fls>
      </operator>
      <target>var:fls</target>
    </instr>
    <comment> ************* Prepare a results document ************** </comment>
    <instr>
      <type>copy</type>
      <operand>
        <results />
      </operand>
      <target>var:results</target>
    </instr>
    <comment> *********** Start batch processing loop *********** </comment>
    <while>
      <comment> *********** Loop condition - do processing sequence while there's a file URI left to process *********** </comment>
      <cond>
        <instr>
          <type>xpatheval</type>
          <operand>var:fls</operand>
          <operator>
            <xpath>/descendant::uri[1]</xpath>
          </operator>
          <target>this:cond</target>
        </instr>
      </cond>
      <seq>
        <comment> *********** Copy the URI fragment to a variable and log it to show progress *********** </comment>
        <instr>
          <type>copy</type>
          <operand>var:fls#xpointer(/descendant::uri[1])</operand>
          <target>var:uri</target>
        </instr>
        <instr>
          <type>log</type>
          <operand>var:uri</operand>
        </instr>
        <comment> *********** Main Process - We could do anything we liked here including executing another dpml process or modifying the target file in some way. Here we simply count the elements in the file. *********** </comment>
        <instr>
          <type>xquery</type>
          <operator>
            <xquery> (: ********* Declare the external URI variable and extract the file URI to $file variable ********* :) declare variable $uri as node() external; declare variable $file {$uri/uri/text()}; (: ******* Return a fragment: Quote back the URI fragment and add a count element with the number of elements contained in the target document ******* :) &lt;result&gt; {$uri} &lt;count&gt; {count(doc($file)/descendant::*)} &lt;/count&gt; &lt;/result&gt; </xquery>
          </operator>
          <uri>var:uri</uri>
          <target>var:result</target>
        </instr>
        <comment> *********** Append the xquery result to our cumulative var:results document *********** </comment>
        <instr>
          <type>stm</type>
          <operand>var:results</operand>
          <operator>
            <stm:group xmlns:stm="http://1060.org/stm">
              <stm:append xpath="/results">
                <stm:param xpath="/result:sequence/result:element/result" />
              </stm:append>
            </stm:group>
          </operator>
          <param>var:result</param>
          <target>var:results</target>
        </instr>
        <comment> ********** Exception: Catch any processing exceptions... ********** </comment>
        <exception>
          <comment> ********** Since this is a dum example we'll simply log the exception, you can add more extensive error handling for your process if required... ********** </comment>
          <instr>
            <type>log</type>
            <operand>this:exception</operand>
          </instr>
        </exception>
        <comment> *********** Remove the first URI from the file listing before starting next iteration of the loop. If this isn't done we'll have an infinite loop!!! *********** </comment>
        <instr>
          <type>stm</type>
          <operand>var:fls</operand>
          <operator>
            <stm:group xmlns:stm="http://1060.org/stm">
              <stm:delete xpath="/descendant::uri[1]" />
            </stm:group>
          </operator>
          <target>var:fls</target>
        </instr>
      </seq>
    </while>
    <comment> *********** All done. Style the results for presentation. *********** </comment>
    <instr>
      <type>xslt</type>
      <operand>var:results</operand>
      <operator>
        <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
          <xsl:output method="html" />
          <xsl:template match="/results">
            <html>
              <body>
                <h1>Batch Results</h1>
                <table>
                  <tr bgcolor="#aaaaaa">
                    <td>file</td>
                    <td>elements</td>
                  </tr>
                  <xsl:for-each select="result">
                    <tr>
                      <td>
                        <xsl:value-of select="uri" />
                      </td>
                      <td>
                        <xsl:value-of select="count" />
                      </td>
                    </tr>
                  </xsl:for-each>
                </table>
              </body>
            </html>
          </xsl:template>
        </xsl:stylesheet>
      </operator>
      <target>this:response</target>
    </instr>
  </seq>
</idoc>
© 2003-2005, 1060 Research Limited. 1060 registered trademark, NetKernel trademark of 1060 Research Limited.