Why ElasticWarehouse

ElasticWarehouse is a file data storage build on the top of ElasticSearch. It’s an open-source project funded by EffiSoft Poland.

Since ElasticWarehouse is build on the top of ElasticSearch it has all its features. We bring here most important aspects of ElasticWarehouse cluster.

Read and Write Efficiency: ElasticWarehouse is designed to allow a single cluster to work under high read and write load. Files are partitioned and spread over a cluster of machines to allow applications read write and search for files efficiently using power of many machines and not only single one.

Clustering: ElasticWarehouse has an ability to distribute data across multiple nodes for load balancing and efficient read, write and searching,
Horizontal Scalability: By adding more nodes you can elastically and transparently increase the capacity of the cluster without downtime. Simply bring more nodes up and shut down nodes – the system will continue to serve requests,
Fault Tolerant: Don’t need to configure RAID-1 configurations, cluster is fault tolerant thanks to replicas. The loss of any node in the cluster won’t affect the stability of the cluster. Number of fault tolerance nodes can be increased by increasing number of replicas.

Custom endpoints: ElasticWarehouse has custom REST API endpoints and custom JSON API. In advanced mode you can always send requests directly to ElasticSearch lower layer,
Full Text Search: One of the biggest features of ElasticWarehouse cluster is the ability to not only return files, but also files that contain words that are related or relevant to the search keywords. List of all available extensions is here: http://tika.apache.org/1.7/formats.html
Geosearch: Has ability to perform geo searching for documents with GEO information (like photos)
Versioning: Each file modification has it’s own, unique version number
Shell tools: you can access to files stored in ElasticWarehouse cluster like to regular filesystem

ElasticWarehouse is integrated with Grafana and ElasticSeaerch plugins. You can easly attach ElasticWarehouse to your existing monitoring or different grafana instance, or use embedded one. If you need custom ElasticSearch plugins, they can be easily added like to regular ElasticSearch cluster. ElasticWarehouse implements simple Graphite compatible API.

Default ElasticWarehouse work mode is an embedded mode, but ElasticWarehouse can cooperate with existing ElasticSearrch clusters (remote mode) or even be deployed as a plugin on existing ElasticSearrch cluster (plugin mode).

Embedded Remote Plugin
Full ElasticWarehouse REST API features      
Ability to cooperate to exisitng ElasticSearch cluster      
Logging global OS performance counters (CPU, Memory, Network, Disks etc.)      
Logging performance counters specific for running instance      
Embedded Grafana instance      
Full ElasticSearch API features (HTTP REST API, Transport client, Node Client)      
Ability to create new cluster      

In Embedded mode ElasticWarehouse uses all features of ES node client. It means each newly started ElasticWarehouse instance becomes a part of ElasticWarehouse or ElasticSearch cluster.

configuration1configuration2

Figure 1. Two possible configurations in Embedded mode

In Embedded mode ElasticWarehouse uses all features of ES transport client. In this configuration ElasticWarehouse behaves like a ElasticSearch client.

configuration3

Figure 2. Visualization of configuration in Remote mode

ElasticWarehouse is also distributed in a form of ElasticSearch plugin. This configuration is useful when you have existing ElasticSearch cluster and you don’t want to add more nodes into it. Note that ElasticWarehouse plugin has limited features (see table above).

configuration4

Figure 3. Visualization of ElasticSearch cluster with ElasticWarehouse plugin deployed