Collector JavaScript API
Contents
Introduction
The collector JS API can be accessed via Enonic applicatoins. It is provisioned via the lib-explorer
dependency. Similar to the ingest API, the collector API provides everything you need to work with collections and documents.
Building your own collector? Check out the collector starter tutorial |
Usage
Add the following to your build.gradle
file:
dependencies {
include "com.enonic.lib:lib-explorer:4.x.x"
}
In your task controller, add a require statement:
import { Collector } from '/lib/explorer';
Create the Collector object within the run function. The paramers are automatically passed to the task from Explorer.
export function run({
collectionId,
collectorId,
configJson,
language
}) {
const collector = new Collector<CollectorConfig>({
collectionId, collectorId, configJson, language
});
// ...
}
You are now ready to use the API.
Constructor
Constructor for the Collector class.
Description
-
Throws if any of the parameters are missing or invalid.
-
Gets information about the Collection.
-
Parses the configJson and provides the result via the config property.
Parameters
An object with the following keys and their values:
Name | Kind | Details |
---|---|---|
collectionId |
string |
Id of the collection to use |
collectorId |
string |
Id of the collector (i.e. |
configJson |
object |
The collector configuration |
language |
string |
a valid locale |
Returns
Collector object
Methods
These are the available methods of the Collector object
start
Description
-
Stores a startTime timestamp used in duration calculations.
-
Reports an initial progress via lib-task.
-
Creates a new collection repo (if needed).
-
Persists information that the collector is running (so it can’t be run twice in parallel, but can be stopped).
-
Sets up a journal to write state information to.
Parameters
None
Returns
Void
Examples
collector.start();
queryDocuments
This method makes is possible to query the exisiting documents in the collection. Useful to find the document id to modify an existing document, rather than creating a new one.
Parameters
An object with the following keys and their values:
Name | Type | Attributes | Details |
---|---|---|---|
start |
number |
optional |
Start index used for paging - default: 0 |
count |
number |
optional |
Number of content to fetch, used for paging - default: 10 |
query |
string/object |
Query string or DSL expression. |
|
filters |
object |
optional |
Query filters |
sort |
string/object |
optional |
Sorting string or DSL expression. Default: '_score DESC' |
aggregations |
object |
optional |
Aggregations config |
highlight |
object |
optional |
Highlighting config |
explain |
boolean |
optional |
If set to |
Returns
object : stats, hits and if requested, aggregations
Examples
collector.queryDocuments({
start: 0,
count: 2,
query: "startTime > instant('2016-10-11T14:38:54.454Z')",
filters: {
boolean: {
must: [
{
exists: {
field: "modifiedTime"
}
},
{
exists: {
field: "other"
}
}
],
mustNot: {
hasValue: {
field: "myField",
values: [
"cheese",
"fish",
"onion"
]
}
}
},
notExists: {
field: "unwantedField"
},
ids: {
values: ["id1", "id2"]
}
},
sort: "duration DESC",
});
{
"total": 12902,
"count": 2,
"hits": [
{
"id": "b186d24f-ac38-42ca-a6db-1c1bda6c6c26",
"score": 1.2300000190734863
},
{
"id": "350ba4a6-589c-498b-8af0-f183850e1120",
"score": 1.399999976158142
}
],
}
persistDocument
This method will create or modify a document, based on its parameters. It can also extend a documentType and validate against it.
Parameters
Name | Kind | Details |
---|---|---|
document |
object |
The document to persist |
options |
object |
Options to use when persisting the document |
Document object
Name | Kind | Attributes | Details |
---|---|---|---|
_id |
string |
<optional> |
Id of an exisiting document to modify |
_name |
string |
<optional> |
Name of an exisiting document to modify |
_parentPath |
string |
<optional> |
Parentpath of an exisiting document to modify - default: '/' |
…rest |
any |
<optional> |
Any other properties of the document to persist |
Options object
Name | Kind | Attributes | Details |
---|---|---|---|
boolRequireValid |
boolean |
<optional> |
Whether a document must validate in order to be created or modified - default: false |
documentTypeName |
string |
<required> |
Which documentType to use for indexing and validate against |
Returns the persisted document
Examples
const document = collector.persistDocument({
text: `This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.
More information...`,
title: 'Example Domain',
url: 'https://example.com/'
}, {
boolRequireValid: true,
documentTypeName: 'my_document_type'
});
log.info('Document persisted:%s', JSON.stringify(document, null, 4));
deleteDocument
This method makes it possible to delete one or more documents from the collection.
Parameters
Name | Kind | Details |
---|---|---|
keys |
string | Array.<string> |
Document keys to delete. Each argument could be an id or a path. Prefer the usage of ID rather than paths. |
Returns
Array.<string> : The list of keys that were actually deleted.
Examples
const deletedId = collector.deleteDocument('9aba4116-a219-4ccf-9f7a-17dc1486f82e');
'9aba4116-a219-4ccf-9f7a-17dc1486f82e'
const deletedIds = collector.deleteDocument(
'9aba4116-a219-4ccf-9f7a-17dc1486f82e',
'1f5fd4b6-1bfa-4a5b-adde-5241982ea200'
);
[
'9aba4116-a219-4ccf-9f7a-17dc1486f82e',
'1f5fd4b6-1bfa-4a5b-adde-5241982ea200'
]
getDocumentNode
This method makes it possible to get one ore more documents from the collection.
Parameters
Name | Kind | Details |
---|---|---|
keys |
string | Array.<string> |
Document keys to get. Each argument could be an id or a path. Prefer the usage of ID rather than paths. |
Returns
Object | Array<Object> : One or more gotten documents.
Examples
const node = collector.getDocumentNode('9aba4116-a219-4ccf-9f7a-17dc1486f82e');
{
_id: '9aba4116-a219-4ccf-9f7a-17dc1486f82e',
_name: '9aba4116-a219-4ccf-9f7a-17dc1486f82e',
_path: '/9aba4116-a219-4ccf-9f7a-17dc1486f82e',
text: `This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission. More information...`,
title: 'Example Domain',
url: 'https://example.com/'
}
const node = collector.getDocumentNode(
'9aba4116-a219-4ccf-9f7a-17dc1486f82e',
'1f5fd4b6-1bfa-4a5b-adde-5241982ea200'
);
[{
_id: '9aba4116-a219-4ccf-9f7a-17dc1486f82e',
_name: '9aba4116-a219-4ccf-9f7a-17dc1486f82e',
_path: '/9aba4116-a219-4ccf-9f7a-17dc1486f82e',
text: `This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission. More information...`,
title: 'Example Domain',
url: 'https://example.com/'
}, {
_id: '1f5fd4b6-1bfa-4a5b-adde-5241982ea200',
_name: '1f5fd4b6-1bfa-4a5b-adde-5241982ea200',
_path: '/1f5fd4b6-1bfa-4a5b-adde-5241982ea200',
text: `fnord`,
title: 'Fnord',
url: 'https://fnord.com/'
}]
shouldStop
This method checks whether the STOP button has been clicked in the Explorer Admin GUI.
Useful to finish gracefully, for instance by breaking loops.
Parameters
None
Returns
Boolean : Whether the STOP button has been clicked.
Examples
while (!collector.shouldStop() && moreToDo) {
// do something
}
addSuccess
Adds a success entry to the journal.
Parameters
An object with the following keys and their values:
Name | Kind | Details |
---|---|---|
message |
string |
The success message to add to the journal |
Returns
Void
Examples
collector.addSuccess({
message: `The document scraped from ${url} was persisted successfully :)`
});
addInformation
Adds an info entry to the journal.
Parameters
An object with the following keys and their values:
Name | Kind | Details |
---|---|---|
message |
string |
The information message to add to the journal |
Returns
Void
Examples
collector.addInformation({
message: `While scraping ${url} something interesting was found.`
});
addWarning
Adds a warning entry to the journal.
Parameters
Name | Kind | Details |
---|---|---|
message |
string |
The warning message to add to the journal |
Returns
Void
Examples
collector.addWarning({
message: `${url} isn't available today`
});
addError
Adds an error entry to the journal.
Parameters
Name | Kind | Details |
---|---|---|
message |
string |
The error message to add to the journal |
Returns
Void
Examples
try {
// do something that fails
} catch (e) {
collector.addError({
message: `It's a real problem that ${url} isn't available :(`
});
}
stop
Description
-
Persists the journal to the journal repo.
-
Sends emails if notifications are configured on the Explorer Admin GUI.
-
Persists information that the collector has failed or finished (so it can be started again).
Parameters
None
Returns
Void
Examples
collector.stop();