Static Export after publish in the Background

From OpenCms Wiki
Jump to: navigation, search

Static Export After Publish can take a long time and normally, users may not want for the export run to complete, after they have exported something. Therefore, it is desirable to have the export proceed in the Background. This can be accomplished with a custom static export handler, which is described here.

Contents

System Requirements

This export handler was tested and is used successfully in production with opencms 6.2.1. It is expected to work with all versions of of opencms 6.2.x and probably with all versions of opencms 6.0.x, too.

Functionality

This export handler is a wrapper around the original export handler that is normally used for Static Export After Publish. The difference is, that export activity is done in a background task, so that uses do not have to wait for the completion of a static export run, before they can continue after publishing something.

Please see the source code below for further comments.

Configuration

  • This export handler conficured exactly the same way as the original static export handler for static export after publish. You should start to create a working export configuration for the export handler org.opencms.staticexport.CmsAfterPublishStaticExportHandler. For Details about this, you may want to read the description at http://www.opencms-forum.de/opencms-forum/viewthread?thread=1279.
  • After you have managed to do this, you have to download the jar file and place into the WEB-INF/libs folder of your opencms installation.
  • In your opencms static export configuration, replace org.opencms.staticexport.CmsAfterPublishStaticExportHandler with com.berzinarchives.cms.staticexport.BaAfterPublishBgrExportHandler. This will cause the new export handler to be used. Then restart opencms.

After this, when you publish something the next time, the static export should be done in the background.

Source Code (com/berzinarchives/cms/staticexportBaAfterPublishBgrExportHandler.java)

Notice: the source code requires at least Java 1.5 to compile.


package com.berzinarchives.cms.staticexport;

import java.util.List;
import java.util.Locale;
import java.util.Vector;

import org.opencms.db.CmsPublishedResource;
import org.opencms.file.CmsObject;
import org.opencms.main.CmsException;
import org.opencms.main.OpenCms;
import org.opencms.report.CmsLogReport;
import org.opencms.report.I_CmsReport;
import org.opencms.staticexport.CmsAfterPublishStaticExportHandler;
import org.opencms.staticexport.I_CmsStaticExportHandler;
import org.opencms.util.CmsUUID;

/**
 * This source code is released to the public domain and can be used and modified freely.
 * Questions can be sent to the opencms-development mailing list at www.opencms.org.
 *
 * This is a wrapper around {@link CmsAfterPublishStaticExportHandler}. To use
 * this class, the best idea is to first configure publishing with
 * CmsAfterPublishStaticExportHandler. After this works, then the reference to
 * org.opencms.staticexport.CmsAfterPublishStaticExportHandler in
 * opencms-importexport.xml can be replaced by a reference to this class -
 * com.berzinarchives.cms.staticexport.BaAfterPublishBgrExportHandler A
 * configuration that works with CmsAfterPublishStaticExportHandler should be
 * usable with this class without changing anything except the class name of the
 * export class.
 * 
 * Like CmsAfterPublishStaticExportHandler original publishing class, this class
 * will export all resources of the website whenever something is published, but
 * the export operation will take place in the background so that nobody has to
 * wait for it to finish.
 * 
 * All export operations will take place in a low-priority background thread and
 * all messages will be written to the log (with log level INFO). If publishing
 * is still running and more publish request are triggered, then these
 * additional publish requests will be queued and processed one after another by
 * the same background Thread.
 * 
 * 
 * This class does a simple "optimization" to protect against the queue of
 * pending publish operations becoming too big, if many people publish in a
 * short time for the same website. Normally, in this case the queue would
 * become very big, because people might quickly publish many individual
 * resources and each of these publish operations would add another publish
 * request for the complete website to the queue. Therefore, this class will
 * discard an incoming publish request, if there is a pending publish request in
 * the queue that is already going export the same resources. Currently, this
 * will only affect the default website (/sites/default). The mechanism is as
 * follows: <br>
 * 1 - Check, which resources are contained in the current request. This is a
 * very primitive optimization: we check, if the current request contains *only*
 * resources from the default website.<br>
 * 2 - If this is the case and if another publish request is in the queue also
 * contrains at least *some* resources from the default website, then we can
 * safely discard this publish request, because we already have a publish
 * request pending that is going to export the default website <br>
 * 3 - If the request contains resources that do *not* belong to the default
 * website, then we will always add this request to the queue
 * 
 * 
 * NOTICE: The optimization is based on two major assumptions:<br>
 * 1 - The published resources should be almost always *alone* from the default
 * website - otherwise there will not be much of optimization effect. (Thus,
 * these checks will only help much for installations, where the default website
 * (/sites/default) is the only site that uses static export or if any other
 * sites are rarely published.<br>
 * 2 - This logic *strictly assumes* that export is configured in such a way,
 * that exporting *one* resource from the default website will *always* lead to
 * the static export of the *complete* default site. If the export is not
 * configured like that and if instead there are export-subsets of resources for
 * the default site, then this mechanism will discard too many publish requests!
 * 
 * Date: $Date$
 * 
 * @version $Revision$
 */
public class BaAfterPublishBgrExportHandler implements I_CmsStaticExportHandler {

	/**
	 * a small inner class for doing the actual background publishing in a
	 * separate background tread
	 * 
	 * Date: $Date$
	 * 
	 * @version $Revision$
	 * @author Christian Steinert christian_steinert@web.de
	 */
	class BgrThread extends Thread {
		static final String C_DEFAULT_SITE_ROOT = "/sites/default";

		/**
		 * vector with one CmsUUID for each pending publish request
		 */
		Vector<CmsUUID> publishQueue = new Vector<CmsUUID>(5);

		/**
		 * Contains the number of the last dispatch request in the queue which
		 * contains resources from default website. If the queue contains no
		 * publish requests for the default website, then this number is set to
		 * -1
		 */
		int lastQueuePosWithDefaultResources = -1;

		/**
		 * report class to write all publish information into the opencms log
		 */
		CmsLogReport logger;

		/**
		 * default constructor
		 */
		BgrThread() {
			this.logger = new CmsLogReport(
					Locale.getDefault(),
					com.berzinarchives.cms.staticexport.BaAfterPublishBgrExportHandler.class);
		}

		/**
		 * add another publish task to the background export thread. This must
		 * be called once, before starting this thread.
		 * 
		 * @param publishHistoryId
		 *            ID of a publish operation
		 * 
		 * @returns true, if the publish request really needs to be published,
		 *          false if the request should just be discarded, because an
		 *          identical or more comprehensive one is in the queue already.
		 * 
		 */
		public boolean addPublishTask(CmsUUID publishHistoryId)
				throws CmsException {
			boolean requestContainsNonDefaultResources = false;
			boolean requestContainsDefaultResources = false;

			CmsObject cms = OpenCms.initCmsObject(OpenCms.getDefaultUsers()
					.getUserExport());

			synchronized (LOCK_OBJECT) {
				List publishedResources = cms
						.readPublishedResources(publishHistoryId);

				// check, which types of resources are contained in this publish
				// request (whether it contains resources from the default
				// website and whether it contains resources outside the default
				// site)
				for (Object resObj : publishedResources) {
					CmsPublishedResource res = (CmsPublishedResource)resObj;
					String resName = res.getRootPath();
					if (!resName.startsWith(C_DEFAULT_SITE_ROOT)) {
						// this request contains resources that don't belong
						// to the default site.
						requestContainsNonDefaultResources = true;
					} else {
						// this request contains resources from the
						// default website
						requestContainsDefaultResources = true;
					}

					if (requestContainsDefaultResources
							&& requestContainsNonDefaultResources)
						// We have found both default and non-default resources
						// in the request. We could not possibly find anything
						// besides that, so we don't need to check further =>
						// exit loop
						break; // no, publish is not necessary
				}

				// now check, if the request can be discarded or has to be
				// queued
				if ((!requestContainsNonDefaultResources)
						&& (lastQueuePosWithDefaultResources >= 0)) {
					// The current request contains only resources from the
					// default site and we still have a pending request in
					// the queue that will publish the default site
					return false;
				} else {
					// This request cannot be ignored for one or more of
					// following reasons:
					// - the request contains some resources that don't belong
					// to the default site
					// - OR: there is no pending publish request that contains
					// default resources
					this.publishQueue.add(publishHistoryId);

					if (requestContainsDefaultResources) {
						// The request contains default resources so now it will
						// be the latest pending request that contains default
						// resources
						this.lastQueuePosWithDefaultResources = this.publishQueue
								.size() - 1;
					}
					return true; // yes, publish is necessary
				}

			}
		}

		/**
		 * start the publish operation in Background; continue publishing, as
		 * long as more publish requests come in the meantime through
		 * {@link #addPublishTask(CmsUUID)}
		 * 
		 * @see java.lang.Runnable#run()
		 */
		public void run() {
			// thread: publish pending export requests from the export queue,
			// until nothing new has come in and the queue is empty.
			CmsUUID currentPublishHistoryId;

			while (true) {
				// more work to do?
				synchronized (LOCK_OBJECT) {
					if (this.publishQueue.isEmpty()) {
						// queue is empty => work is finished.
						// clear reference to this thread in outer class
						// and stop working
						BaAfterPublishBgrExportHandler.publishThread = null;
						return;
					} else {
						// get next publish task from FIFO queue
						currentPublishHistoryId = this.publishQueue.remove(0);

						// keep track of the position of the last pending
						// request that contains default resources
						if (this.lastQueuePosWithDefaultResources >= 0)
							this.lastQueuePosWithDefaultResources--;
					}
				}

				// create new export class and start the next export task
				assert currentPublishHistoryId != null;
				new CmsAfterPublishStaticExportHandler()
						.performEventPublishProject(currentPublishHistoryId,
								this.logger);

				// wait for a minute to give the database a break
				try {
					sleep(60000);
				} catch (InterruptedException ie) {
				}
			}
		}
	}

	/**
	 * this object is used for sync'ing export request from outside and
	 * chanelling them through to the one any only export thread that really
	 * DOES the export work.
	 */
	static final Object LOCK_OBJECT = new Object();

	/**
	 * the actual background publishing class
	 */
	static BgrThread publishThread;

	/**
	 * is this instance still doing something in the original task? The original
	 * task will only initialize the background task and will then - after a
	 * very short amound of time be finished with its work
	 */
	boolean busy;

	/**
	 * default constructor
	 */
	public BaAfterPublishBgrExportHandler() {
	}

	/**
	 * @see org.opencms.staticexport.I_CmsStaticExportHandler#isBusy()
	 */
	public boolean isBusy() {
		return this.busy;
	}

	/**
	 * trigger the actual export in the a separate background thread running
	 * with low priority
	 * 
	 * @see org.opencms.staticexport.I_CmsStaticExportHandler#performEventPublishProject(org.opencms.util.CmsUUID,
	 *      org.opencms.report.I_CmsReport)
	 */
	public void performEventPublishProject(CmsUUID publishHistoryId,
			I_CmsReport report) {

		this.busy = true;

		try {
			// add the new publish request to the queue
			synchronized (LOCK_OBJECT) {
				if (publishThread == null) {
					// Currently no publish thread is running in the background.
					// Create a new one and start it.
					publishThread = new BgrThread();
					boolean taskAccepted = publishThread
							.addPublishTask(publishHistoryId);
					if (!taskAccepted)
						return; // no need to add this publish request to the
					// queue and therefore no need to start
					// a publishing thread

					// start the new publish thread
					publishThread.setPriority(Thread.MIN_PRIORITY);
					publishThread.start();
				} else {
					// there is already a background thread running.
					// add this publish request to its queue.
					publishThread.addPublishTask(publishHistoryId);
				}
			}
		} catch (CmsException e) {
			// for some reason the publish request could not be added
			// (this cannot normally happen and would only the case
			// if this class is called for a publish request id that
			// doesn't actually exist)
			report.addError(e);
			return;
		}

		// write a notice into report that publishing begins
		report
				.println(
						org.opencms.staticexport.Messages
								.get()
								.container(
										org.opencms.staticexport.Messages.RPT_STATICEXPORT_NONTEMPLATE_RESOURCES_BEGIN_0),
						I_CmsReport.FORMAT_HEADLINE);

		// done. Formally - from the side of the opencms publish dialog -
		// this handler is not busy anymore, after it has triggered backend
		// publishing.
		this.busy = false;
	}
}
Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox