Building a custom collector

Contents

Building a custom collector

Introduction

Collectors are implemented as standard XP applications. They essentially consist of two main components:

Form

A React based user interface that enable search administrators to instruct the collector in it’s specific activities.

Task

Code that performs the actual retrieval of content for indexing.

This section will guide you through the basic steps of building your own custom collector.

Step 1: Create application

Use Enonic CLI to create an application based on the vanilla starter:

enonic project create -r starter-explorer-collector

The starter ships with the essential setup required for any collector:

Dependencies

build.gradle
dependencies {
	include 'com.enonic.lib:lib-explorer:2.2.1'
}
Remember to use a version of the library that is equal to or newer than the version of Explorer you are using.

Register / Unregister

In order for Explorer to detect available collectors, each collector needs to register, and unregister itself:

main.es
import {register, unregister} from '/lib/explorer';

register({
	appName: app.name,
	collectTaskName: 'collect',
	configAssetPath: 'react/Collector.esm.js',
	displayName: 'My collector'
});

__.disposer(() => {
	unregister({
		appName: app.name,
		collectTaskName: 'collect'
	});
});

Step 2: React form

The starter also provides the essential build system for the React-based user interface.

Some important ingredients that enable this are:

  • node-gradle-plugin

  • webpack

  • babel

  • node_modules

    • @enonic/semantic-ui-react-form

    • @enonic/webpack-esm-assets

    • @enonic/webpack-server-side-js

    • get-value

    • semantic-ui-react

    • set-value

React component

In order for your collector’s configuration user interface to work in Explorer you must provide a React component. Any react component type should be supported, but all examples are functional (since that is the current status quo of react).

The component receives five props from Explorer: . context - Read only object which are change via dispatching actions to its state reducer. . dispatch - Function to send actions to the state reducer. . explorer - static information like contentTypes, fields and sites . isFirstRun - Use this to only do things once during initialization of your form. (And not on every render) . path - Where your form is located in the context object.

context object

This object contains a lot of data used to maintain state by semantic-ui-react-form. In most cases knowing about its values property should be enough.

dispatch function

You can import various reducer actions from semantic-ui-react-form and send the to the reducer using the dispatch function.

explorer object

This object contains information from Explorer about the collector context. The information can be used to make dropdowns in your collectors configuration.

isFirstRun

This is a React ref object whose .current property is initialized to true. If you have any code that should only run once on initialization of you collector component, you can use this object to achieve this. Here is some example code:

import getIn from 'get-value';

export const Collector = ({
	context,
	isFirstRun,
	path
}) => {
	let initialValues = getIn(context.values, path);
	if (isFirstRun.current) {
		isFirstRun.current = false; // Make sure the code block never runs again.
		if (!initialValues) {
			initialValues = {
				aProperty: 'defaultValueForProperty',
			}
		}
	} // if isFirstRun
} // Collector

You can read more about React ref objects here: https://reactjs.org/docs/hooks-reference.html#useref

path

Use to this to fetch your form values and also when dispatching actions to the state reducer.

Example

src/resources/assets/js/react/Collector.jsx
import getIn from 'get-value';
import setIn from 'set-value';
import {Button, Form, Header, Icon, Table} from 'semantic-ui-react';
import {
	setError,
	setSchema,
	setValue,
	setVisited,
	DeleteItemButton,
	Form as EnonicForm,
	Input,
	InsertButton,
	List,
	MoveDownButton,
	MoveUpButton,
	SetValueButton
} from 'semantic-ui-react-form';

function required(value) {
	return value ? undefined : 'Required!';
}

const SCHEMA = {
	uri: (v) => required(v)
};

export const Collector = (props) => {
	const {
		context,
		dispatch,
		explorer,
		isFirstRun,
		path
	} = props;
	let initialValues = getIn(context.values, path);
	if (isFirstRun.current) {
		//console.debug('isFirstRun');
		isFirstRun.current = false;
		dispatch(setSchema({path, schema: SCHEMA}));
		if (!initialValues) {
			initialValues = {
				uri: ''
			};
			dispatch(setValue({path, value: initialValues}));
		}
		return <EnonicForm
			initialValues={initialValues}
			onChange={(values) => {
				//console.debug('Collector onChange values', values);
				dispatch(setValue({path, value: values}));
			}}
			schema={SCHEMA}
		>
			<Form as='div'>
				<Form.Field>
					<Input
						fluid
						label='Uri'
						path='uri'
					/>
				</Form.Field>
			</Form>
		</EnonicForm>;
};

Step 3: Collector task

The actual code to retrieve and return content for indexing is implemented using named tasks.

The most important parts of a collector are:

Progress reporting

In the explorer app there is a page to display Collector status. In order for this page to show useful updated information. The collector tasks needs to send progress information. When your collector task runs

collector.start();

A collector.taskProgressObj will be created automatically. Looking something like this:

collector.taskProgressObj = {
	current: 0,
	info: {
		name: 'Example',
		message: 'Initializing...',
		startTime: '2020...'
	},
	total: 1 // So it appears there is something to do.
}

A collector task may have a set or changing number of operations to perform. You should keep the progress updated something like this:

collector.start();
collector.taskProgressObj.total = initialNumberOfOperations;
while(somethingToDo) {
	collector.taskProgressObj.info.uri = currentUri;
	collector.taskProgressObj.info.message = 'Some useful information';
	collector.progress(); // This will update task progress. So it can be seen.

	// ... do stuff ...

	collector.taskProgressObj.total += foundSomeMoreOperationsToPerform;

	collector.taskProgressObj.current += 1;
}
collector.stop();

Finally when you collector task calls

collector.stop();

It will set current = total and a nice info.message = Finished with ${x} errors.;

Journal

When a collector task is finished. A journal will be persisted. The journal contains information about things that went well, and possible errors. Write to the journal by using addSuccess or addError like this:

try {
	// ... do some stuff that could fail ...
	collector.addSuccess({uri: currentUri});
} catch (e) {
	collector.addError({uri: currentUri, message: e.message});
}

CRUD

When you have collected some information you want to make available for later search you have to persist it. This can be done by calling persistDocument like this:

collector.persistDocument({
	aField: 'aTag', // Optional, perhaps used in aggregation and filtering.
	anotherField: 'anotherTag', // Optional, perhaps used in aggregation and filtering.
	text, // Required!
	title, // Required!
	uri, // Required!
	whatever: 'perhapsAnImageUrl' // Optional, perhaps used when displaying search results.
});

The explorer library Collection class currently does not provide any api for reading and deleting documents. You may connect to the collection repositories via standard Enonic API’s or via other currently undocumented Explorer library functions.

Example code

The complexity of a collector may vary, but as to provide a basic idea, the starter includes a simple example:

src/resources/tasks/collect.xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<task>
	<description>Collect</description>
	<form>
		<input name="name" type="TextLine">
			<label>Name</label>
			<occurrences minimum="1" maximum="1"/>
		</input>
		<input name="collectorId" type="TextLine">
			<label>Collector ID</label>
			<occurrences minimum="1" maximum="1"/>
		</input>
		<input name="configJson" type="TextLine">
			<label>Config JSON</label>
			<occurrences minimum="1" maximum="1"/>
		</input>
	</form>
</task>
src/resources/tasks/collect.es
import {Collector} from '/lib/explorer/collector'; (1)

export function run({name, collectorId, configJson}) { (2)
	const collector = new Collector({name, collectorId, configJson}); (3)

	if (!collector.config.uri) { (4)
		throw new Error('Config is missing required parameter uri!');
	}

	collector.start(); (5)

	const {
		uri,
		object: {
			someNestedProperty
		}
	} = collector.config; (6)

	while(somethingToDo) {
		if (collector.shouldStop()) { break; } (7)

		try {
			const {text, title} = doSomethingThatMayFail(); (8)

			collector.persistDocument({
				text,
				title,
				uri
			}); (9)

			collector.addSuccess({uri}); (10)

		} catch (e) {

			collector.addError({uri, message: e.message}); (11)

		}
	} // while somethingToDo

	// Perhaps delete documents that are no longer found...

	collector.stop(); (12)

} // export function run
1 Import the Collector class
2 The collect task gets passed three named parameters.
3 Construct a Collector instance.
4 Validate the configuration object.
5 Start the collector. Sets startTime and more.
6 Fetch configuration properties you need from the collector.config object.
7 Check if someone has clicked the STOP button.
8 This is where you collect the data you want to persist.
9 Persist the collected data.
10 Make a journal entry that collecting data from uri was a success.
11 Make a journal entry that an error prevented collecting data from uri.
12 Stop the collector. Sets endTime and more.

Step 4: Install and test

When you have built your collector application. Install the jar file on the Enonic XP server where you have Explorer installed. Then create a collection using your collector, and click collect to see what happens. It is a good idea to run locally first and keep an eye on the Enonic XP server log.

Contents