Configuration

  • -

Configuration

ElasticWarehouse is distributed with default configuration optimized for most common configurations. Building ElasticSearch cluster can be very complex project, so we refer you to https://www.elastic.co/ website for more information about it. Here we focus on basic cluster configuration only.

Main configuration

ElasticWarehouse configuration files are in config folder

ls -l /opt/elasticwarehouse/config/

elasticsearch.yml
elasticwarehouse.yml

elasticsearch.yml is an ElasticSearch configuration file. Configuration file is used when ElasticWarehouse starts in embedded mode (default mode). In this mode ElasticWarehouse creates data ElasticSearch node and tries connect to existing cluster (defined in cluster.name) using multicast discovery.
To change Node configuration you can edit elasticsearch.yml and restart the Node. More information about configuration file you may find here.

elasticwarehouse.yml is main ElasticWarehouse configuration file. See table below for more details:

Group Key Type Default value Description
Mode definition mode.embedded boolean true Defines ElasticWarehouse instance work mode (one of: embedded or remote).
Remote mode specific elasticsearch.cluster string elasticwarehouse Defines cluster name to connect when instance works in remote mode (when mode.embedded is false)
elasticsearch.hosts string n/a host1,host2:port
Embedded mode specific grafana.port int 10500 Defines port Grafana to be listen on it. In case of binding exception ElasticWarehouse will try to use next available port, i.e. 10501, 10502, 10503 … etc
ElasticSearch index definitions elasticsearch.template.storage.name string elasticwarehousestorage Should be the same as elasticsearch.index.storage.name
elasticsearch.template.tasks.name string elasticwarehousetasks Should be the same as elasticsearch.index.tasks.name
elasticsearch.index.storage.name string elasticwarehousestorage Index name to store files
elasticsearch.index.storage.type string files Inside index we need to define type to store files. You can manually access files via ElasticSearch REST API, like: http://<host>:<port>/index/type/_search
elasticsearch.index.storage.childtype string childfiles Each file uploaded to the ElasticWarehouse cluster is parsed to get as much as possible information about it (i.e. for images it will be exif data, for PDF files it will be text file content). Some files like PDF or WORD may contain embedded files (like images, attachments or OLE objects). ElasticWarehouse extracts all such embedded files and store them in separate child type (one file stored in "type" may have many references to the "childfiles"). Thanks to that ElasticWarehouse is able to search in more advance way.
elasticsearch.index.tasks.name string elasticwarehousetasks Each operation like folder creation, files scan or upload etc is asynchronous and logged as task. Attribute defines index name to keep all tasks history (see _ewtask rest point for more details)
elasticsearch.index.tasks.type string tasks We store data inside type not inside index. You can manually access tasks via ElasticSearch REST API, like: http://<host>:<port>/index/type/_search
Global settings elasticwarehouse.api.port int 10200 Defines port API listen on it. In case of binding exception ElasticWarehouse will try to use next available port, i.e. 10201, 10202, 10203 … etc
log.level string DEBUG Log level. To limit log file size use INFO, WARN or ERROR
path.tmp string /tmp Temp folder location
exclude.files string avi mp4 mkv List of file extensions to be excluded and rejected by the cluster
thumb.size int 360 ElasticWarehouse generates thumb for any image uploaded to the cluster. Available sizes: 90, 180, 360, 720
tasks.max.number int 2 Maximum number of asynchronous tasks to be executed(i.e. asynchronous task is scan – see _ewtask for more details)
rrd.db.path string data folder ElasticWarehouse logs performance counters for monitoring purposes. As default EW creates all RRD databases in the same folder where ElasticSearch create Lucene indices
rrd.hostname string localhost name Set attribute explicitly when you run few ElasticWarehouse instances (nodes) on the same machine. If not set, then hostname will be used.
rrd.enabled boolean true Set to False to disable performance counters collector.
store.content boolean true When store.content=true then ElasticWarehouse behaves as data cloud (it stores extracted file meta information and file content inside the index). When store.content=false then ElasticWarehouse behaves like data indexer only – it doesn't sore binary file content, but only path to the orginal file. When you set to "false" you must configure store.folder
store.folder string /opt/upload When you upload file via _ewuplaod to the ElasticWarehouse and store.content=false, then file content will be saved to this folder.
store.movescanned boolean false When you use "scan" task to import files to the ElasticWarehouse cluster, you can choose whether to make a copy of original file or not. File copy is beeing copied to the location defined in store.folder.

Configuration file is loaded when ElasticWarehouse starts, so after each configuration change you must restart your ElasticWarehouse instance.

Note that some configuration changes like thumb.size, store.content, store.folder, store.movescanned, rrd.db.path etc. may require additional, manual maintenance work, so change them wisely.

For cases when you change thumb.size, we prepared dedicated task “rethumb”. This task recreates all thumbnails according to currently loaded settings.

Logging configuration

Logs are stored in logs folder as default (i.e.: c:\opt\elasticwarehouse\logs or /opt/elasticwarehouse/logs ). Logs folder and logs format can be changed by changing log4j.properties file stored in working folder for ElasticWarehouse process, i.e.: c:\opt\elasticwarehouse\bin\log4j.properties or /opt/elasticwarehouse/bin/log4j.properties .

Post installation
Alternative installations