Collector JavaScript API

Contents

Introduction

The collector JS API can be accessed via Enonic applicatoins. It is provisioned via the lib-explorer dependency. Similar to the ingest API, the collector API provides everything you need to work with collections and documents.

Building your own collector? Check out the collector starter tutorial

Usage

Add the following to your build.gradle file:

dependencies {
  include "com.enonic.lib:lib-explorer:4.x.x"
}

In your task controller, add a require statement:

import { Collector } from '/lib/explorer';

Create the Collector object within the run function. The paramers are automatically passed to the task from Explorer.

export function run({
	collectionId,
	collectorId,
	configJson,
	language
}) {
	const collector = new Collector<CollectorConfig>({
		collectionId, collectorId, configJson, language
	});
	// ...
}

You are now ready to use the API.

Constructor

Constructor for the Collector class.

Description

  • Throws if any of the parameters are missing or invalid.

  • Gets information about the Collection.

  • Parses the configJson and provides the result via the config property.

Parameters

An object with the following keys and their values:

Name Kind Details

collectionId

string

Id of the collection to use

collectorId

string

Id of the collector (i.e. com.example.app:collector)

configJson

object

The collector configuration

language

string

a valid locale

Returns

Collector object

Methods

These are the available methods of the Collector object

start

Description

  • Stores a startTime timestamp used in duration calculations.

  • Reports an initial progress via lib-task.

  • Creates a new collection repo (if needed).

  • Persists information that the collector is running (so it can’t be run twice in parallel, but can be stopped).

  • Sets up a journal to write state information to.

Parameters

None

Returns

Void

Examples

Calling the start method
collector.start();

queryDocuments

This method makes is possible to query the exisiting documents in the collection. Useful to find the document id to modify an existing document, rather than creating a new one.

Parameters

An object with the following keys and their values:

Name Type Attributes Details

start

number

optional

Start index used for paging - default: 0

count

number

optional

Number of content to fetch, used for paging - default: 10

query

string/object

Query string or DSL expression.

filters

object

optional

Query filters

sort

string/object

optional

Sorting string or DSL expression. Default: '_score DESC'

aggregations

object

optional

Aggregations config

highlight

object

optional

Highlighting config

explain

boolean

optional

If set to true, score calculation explanation will be included in result.

Returns

object : stats, hits and if requested, aggregations

Examples

Calling the queryDocuments method
collector.queryDocuments({
    start: 0,
    count: 2,
    query: "startTime > instant('2016-10-11T14:38:54.454Z')",
    filters: {
        boolean: {
            must: [
                {
                    exists: {
                        field: "modifiedTime"
                    }
                },
                {
                    exists: {
                        field: "other"
                    }
                }
            ],
            mustNot: {
                hasValue: {
                    field: "myField",
                    values: [
                        "cheese",
                        "fish",
                        "onion"
                    ]
                }
            }
        },
        notExists: {
            field: "unwantedField"
        },
        ids: {
            values: ["id1", "id2"]
        }
    },
    sort: "duration DESC",
});
Sample response
{
    "total": 12902,
    "count": 2,
    "hits": [
        {
            "id": "b186d24f-ac38-42ca-a6db-1c1bda6c6c26",
            "score": 1.2300000190734863
        },
        {
            "id": "350ba4a6-589c-498b-8af0-f183850e1120",
            "score": 1.399999976158142
        }
    ],
}

persistDocument

This method will create or modify a document, based on its parameters. It can also extend a documentType and validate against it.

Parameters

Name Kind Details

document

object

The document to persist

options

object

Options to use when persisting the document

Document object

Name Kind Attributes Details

_id

string

<optional>

Id of an exisiting document to modify

_name

string

<optional>

Name of an exisiting document to modify

_parentPath

string

<optional>

Parentpath of an exisiting document to modify - default: '/'

…​rest

any

<optional>

Any other properties of the document to persist

Options object

Name Kind Attributes Details

boolRequireValid

boolean

<optional>

Whether a document must validate in order to be created or modified - default: false

documentTypeName

string

<required>

Which documentType to use for indexing and validate against

Returns the persisted document

Examples

Calling the persistDocument method
const document = collector.persistDocument({
	text: `This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.

More information...`,
	title: 'Example Domain',
	url: 'https://example.com/'
}, {
	boolRequireValid: true,
	documentTypeName: 'my_document_type'
});
log.info('Document persisted:%s', JSON.stringify(document, null, 4));

deleteDocument

This method makes it possible to delete one or more documents from the collection.

Parameters

Name Kind Details

keys

string | Array.<string>

Document keys to delete. Each argument could be an id or a path. Prefer the usage of ID rather than paths.

Returns

Array.<string> : The list of keys that were actually deleted.

Examples

Deleting a single document
const deletedId = collector.deleteDocument('9aba4116-a219-4ccf-9f7a-17dc1486f82e');
Sample response
'9aba4116-a219-4ccf-9f7a-17dc1486f82e'
Deleting multiple documents
const deletedIds = collector.deleteDocument(
	'9aba4116-a219-4ccf-9f7a-17dc1486f82e',
	'1f5fd4b6-1bfa-4a5b-adde-5241982ea200'
);
Sample response
[
	'9aba4116-a219-4ccf-9f7a-17dc1486f82e',
	'1f5fd4b6-1bfa-4a5b-adde-5241982ea200'
]

getDocumentNode

This method makes it possible to get one ore more documents from the collection.

Parameters

Name Kind Details

keys

string | Array.<string>

Document keys to get. Each argument could be an id or a path. Prefer the usage of ID rather than paths.

Returns

Object | Array<Object> : One or more gotten documents.

Examples

Getting a single document
const node = collector.getDocumentNode('9aba4116-a219-4ccf-9f7a-17dc1486f82e');
Sample response
{
	_id: '9aba4116-a219-4ccf-9f7a-17dc1486f82e',
	_name: '9aba4116-a219-4ccf-9f7a-17dc1486f82e',
	_path: '/9aba4116-a219-4ccf-9f7a-17dc1486f82e',
	text: `This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission. More information...`,
	title: 'Example Domain',
	url: 'https://example.com/'
}
Getting a multiple documents
const node = collector.getDocumentNode(
	'9aba4116-a219-4ccf-9f7a-17dc1486f82e',
	'1f5fd4b6-1bfa-4a5b-adde-5241982ea200'
);
Sample response
[{
	_id: '9aba4116-a219-4ccf-9f7a-17dc1486f82e',
	_name: '9aba4116-a219-4ccf-9f7a-17dc1486f82e',
	_path: '/9aba4116-a219-4ccf-9f7a-17dc1486f82e',
	text: `This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission. More information...`,
	title: 'Example Domain',
	url: 'https://example.com/'
}, {
	_id: '1f5fd4b6-1bfa-4a5b-adde-5241982ea200',
	_name: '1f5fd4b6-1bfa-4a5b-adde-5241982ea200',
	_path: '/1f5fd4b6-1bfa-4a5b-adde-5241982ea200',
	text: `fnord`,
	title: 'Fnord',
	url: 'https://fnord.com/'
}]

shouldStop

This method checks whether the STOP button has be clicked in the Explorer Admin GUI.

Useful to finish gracefully, for instance by breaking loops.

Parameters

None

Returns

Boolean : Whether the STOP button has been clicked.

Examples

Calling the shouldStop method
while (!collector.shouldStop() && moreToDo) {
	// do something
}

addSuccess

Adds a success to the journal.

Parameters

An object with the following keys and their values:

Name Kind Details

message

string

The success message to add to the journal

Returns

Void

Examples

Calling the addSuccess method
collector.addSuccess({
	message: `The document scraped from ${url} was persisted successfully :)`
});

addInformation

Adds useful information to the journal.

Parameters

An object with the following keys and their values:

Name Kind Details

message

string

The information message to add to the journal

Returns

Void

Examples

Calling the addInformation method
collector.addInformation({
	message: `While scraping ${url} something interesting was found.`
});

addWarning

Adds a warning to the journal.

Parameters

Name Kind Details

message

string

The warning message to add to the journal

Returns

Void

Examples

Calling the addWarning method
collector.addWarning({
	message: `${url} isn't available today`
});

addError

Adds an error to the journal.

Parameters

Name Kind Details

message

string

The error message to add to the journal

Returns

Void

Examples

Calling the addError method
try {
	// do something that fails
} catch (e) {
	collector.addError({
		message: `It's a real problem that ${url} isn't available :(`
	});
}

stop

Description

  • Persists the journal to the journal repo.

  • Sends emails if notifications are configured on the Explorer Admin GUI.

  • Persists information that the collector has failed or finished (so it can be started again).

Parameters

None

Returns

Void

Examples

Calling the stop method
collector.stop();

Contents

Contents