ElasticWarehouse is distributed with default configuration optimized for most common configurations. Building ElasticSearch cluster can be very complex project, so we refer you to https://www.elastic.co/ website for more information about it. Here we focus on basic cluster configuration only.
ElasticWarehouse configuration files are in config folder
ls -l /opt/elasticwarehouse/config/
elasticsearch.yml is an ElasticSearch configuration file. Configuration file is used when ElasticWarehouse starts in embedded mode (default mode). In this mode ElasticWarehouse creates data ElasticSearch node and tries connect to existing cluster (defined in cluster.name) using multicast discovery.
To change Node configuration you can edit elasticsearch.yml and restart the Node. More information about configuration file you may find here.
elasticwarehouse.yml is main ElasticWarehouse configuration file. See table below for more details:
|Mode definition||mode.embedded||boolean||true||Defines ElasticWarehouse instance work mode (one of: embedded or remote).|
|Remote mode specific||elasticsearch.cluster||string||elasticwarehouse||Defines cluster name to connect when instance works in remote mode (when mode.embedded is false)|
|Embedded mode specific||grafana.port||int||10500||Defines port Grafana to be listen on it. In case of binding exception ElasticWarehouse will try to use next available port, i.e. 10501, 10502, 10503 … etc|
|ElasticSearch index definitions||elasticsearch.template.storage.name||string||elasticwarehousestorage||Should be the same as elasticsearch.index.storage.name|
|elasticsearch.template.tasks.name||string||elasticwarehousetasks||Should be the same as elasticsearch.index.tasks.name|
|elasticsearch.index.storage.name||string||elasticwarehousestorage||Index name to store files|
|elasticsearch.index.storage.type||string||files||Inside index we need to define type to store files. You can manually access files via ElasticSearch REST API, like: http://<host>:<port>/index/type/_search|
|elasticsearch.index.storage.childtype||string||childfiles||Each file uploaded to the ElasticWarehouse cluster is parsed to get as much as possible information about it (i.e. for images it will be exif data, for PDF files it will be text file content). Some files like PDF or WORD may contain embedded files (like images, attachments or OLE objects). ElasticWarehouse extracts all such embedded files and store them in separate child type (one file stored in "type" may have many references to the "childfiles"). Thanks to that ElasticWarehouse is able to search in more advance way.|
|elasticsearch.index.tasks.name||string||elasticwarehousetasks||Each operation like folder creation, files scan or upload etc is asynchronous and logged as task. Attribute defines index name to keep all tasks history (see _ewtask rest point for more details)|
|elasticsearch.index.tasks.type||string||tasks||We store data inside type not inside index. You can manually access tasks via ElasticSearch REST API, like: http://<host>:<port>/index/type/_search|
|Global settings||elasticwarehouse.api.port||int||10200||Defines port API listen on it. In case of binding exception ElasticWarehouse will try to use next available port, i.e. 10201, 10202, 10203 … etc|
|log.level||string||DEBUG||Log level. To limit log file size use INFO, WARN or ERROR|
|path.tmp||string||/tmp||Temp folder location|
|exclude.files||string||avi mp4 mkv||List of file extensions to be excluded and rejected by the cluster|
|thumb.size||int||360||ElasticWarehouse generates thumb for any image uploaded to the cluster. Available sizes: 90, 180, 360, 720|
|tasks.max.number||int||2||Maximum number of asynchronous tasks to be executed(i.e. asynchronous task is scan – see _ewtask for more details)|
|rrd.db.path||string||data folder||ElasticWarehouse logs performance counters for monitoring purposes. As default EW creates all RRD databases in the same folder where ElasticSearch create Lucene indices|
|rrd.hostname||string||localhost name||Set attribute explicitly when you run few ElasticWarehouse instances (nodes) on the same machine. If not set, then hostname will be used.|
|rrd.enabled||boolean||true||Set to False to disable performance counters collector.|
|store.content||boolean||true||When store.content=true then ElasticWarehouse behaves as data cloud (it stores extracted file meta information and file content inside the index). When store.content=false then ElasticWarehouse behaves like data indexer only – it doesn't sore binary file content, but only path to the orginal file. When you set to "false" you must configure store.folder|
|store.folder||string||/opt/upload||When you upload file via _ewuplaod to the ElasticWarehouse and store.content=false, then file content will be saved to this folder.|
|store.movescanned||boolean||false||When you use "scan" task to import files to the ElasticWarehouse cluster, you can choose whether to make a copy of original file or not. File copy is beeing copied to the location defined in store.folder.|
Configuration file is loaded when ElasticWarehouse starts, so after each configuration change you must restart your ElasticWarehouse instance.
Note that some configuration changes like thumb.size, store.content, store.folder, store.movescanned, rrd.db.path etc. may require additional, manual maintenance work, so change them wisely.
For cases when you change thumb.size, we prepared dedicated task “rethumb”. This task recreates all thumbnails according to currently loaded settings.
Logs are stored in logs folder as default (i.e.: c:\opt\elasticwarehouse\logs or /opt/elasticwarehouse/logs ). Logs folder and logs format can be changed by changing log4j.properties file stored in working folder for ElasticWarehouse process, i.e.: c:\opt\elasticwarehouse\bin\log4j.properties or /opt/elasticwarehouse/bin/log4j.properties .