Author Archives: elasticwarehouse

  • -

ElasticWarehouse version 1.2.3

Category : Uncategorized

Release note

  • version 1.2.3
    • bug fixes,
    • starting from this version, we support ES 2.x,
    • for ES 2.x version:
      • Tika upgraded to 1.11 (version for ES 1.x still uses Tika 1.7),
      • Kopf and Head plugins upgraded to latest versions,
      • builds done on Java 1.8

ElasticWarehouse standalone packages (to work in embedded and remote modes)

ElasticWarehouse plugin packages (to be hosted as ElasticSearch plugin)


  • -

ElasticWarehouse version 1.2.2

Category : Uncategorized

Release note

  • version 1.2.2
    • bug fixes

ElasticWarehouse standalone packages (to work in embedded and remote modes)

ElasticWarehouse plugin packages (to be hosted as ElasticSearch plugin)

 


  • -

ElasticWarehouse Plugin installation – known issues

Category : Uncategorized

Most common issues are related to jar dependencies. Since ES 2.x has JarHell checker, you may get errors during plugin installation or classically in the runtime. Below we collect most common issues and solutions to fix them.

Runtime exceptions, java.lang.ExceptionInInitializerError or java.lang.ClassNotFoundException when uploading specific file formats to ElasticWarehouse cluster

ElasticWarehouse uses Tika to parse file contents and file metadata. Tika has lot of dependencies and some of them to work correctly must be available in classpath. ElasticWarehouse package contains all needed dependencies in correct versions, but sometimes you may need to include them in classpath,

vim /bin/elasticsearch.in.sh

And edit ES_CLASSPATH variable by adding plugins folder (part marked bold). Remember to provide correct plugin version (in this example we used 1.2.2-2.1.0)

ES_CLASSPATH="$ES_HOME/lib/elasticsearch-2.1.0.jar:$ES_HOME/lib/*:$ES_HOME/plugins/elasticwarehouseplugin/*:$ES_HOME/plugins/elasticwarehouseplugin/elasticwarehouseplugin-1.2.2-2.1.0-jar-with-dependencies.jar"

Issue mostly occurs for:

  • *.atom (java.lang.NoClassDefFoundError: org/jdom/input/JDOMParseException)
  • *.xls, *.xlsx, *.ppt, *.pptx (java.lang.ClassNotFoundException: org.apache.poi.poifs.crypt.cryptoapi.CryptoAPIEncryptionInfoBuilder)

Installation error, java.lang.IllegalStateException

ElasticSearch 2.x has JarHell class to check dependencies. When dependencies are doubled, it will print something like below and stop installation with error code:

Exception in thread "main" java.lang.IllegalStateException: failed to load bundle [file:/opt/elasticwarehouseplugin-1.2.2-2.1.0-jar-with-dependencies.jar] due to jar hell
Likely root cause: java.lang.IllegalStateException: jar hell!
class: org.apache.poi.EmptyFileException
jar1: /home/user/workspace/elasticsearch-2.1.0/lib/poi-3.13.jar
jar2: /home/user/workspace/elasticsearch-2.1.0/plugins/elasticwarehouseplugin/elasticwarehouseplugin-1.2.2-2.1.0-jar-with-dependencies.jar
at org.elasticsearch.bootstrap.JarHell.checkClass(JarHell.java:280)
at org.elasticsearch.bootstrap.JarHell.checkJarHell(JarHell.java:186)
at org.elasticsearch.plugins.PluginsService.loadBundles(PluginsService.java:336)
at org.elasticsearch.plugins.PluginsService.(PluginsService.java:109)
at org.elasticsearch.node.Node.(Node.java:148)
at org.elasticsearch.node.Node.(Node.java:129)
at org.elasticsearch.node.NodeBuilder.build(NodeBuilder.java:145)
at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:178)
at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:285)
at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:35)
Refer to the log for complete error details

In such situation the best way is to deploy ElasticWarehouse plugin without dependencies and copy all *.jar dependencies manually to “<elastic_search>/lib/” folder.

./bin/plugin install http://elasticwarehouse.effisoft.eu/elasticwarehouse/elasticsearch-elasticwarehouseplugin-1.2.2-2.1.0.zip

List of dependencies can be taken from pom.xml file

java.security.AccessControlException

Exception in thread "Thread-11" java.security.AccessControlException: access denied ("java.io.FilePermission" "/home/user/myfiles" "read")
at java.security.AccessControlContext.checkPermission(AccessControlContext.java:472)
at java.security.AccessController.checkPermission(AccessController.java:884)
at java.lang.SecurityManager.checkPermission(SecurityManager.java:549)
at java.lang.SecurityManager.checkRead(SecurityManager.java:888)
at java.io.File.list(File.java:1117)
at java.io.File.listFiles(File.java:1207)
at org.elasticwarehouse.core.parsers.FileTools.scanFolder(FileTools.java:82)
at org.elasticwarehouse.core.parsers.FileTools.scanFolder(FileTools.java:73)
at org.elasticwarehouse.tasks.ElasticWarehouseTaskScan.scanFolder(ElasticWarehouseTaskScan.java:155)
at org.elasticwarehouse.tasks.ElasticWarehouseTaskScan.access$200(ElasticWarehouseTaskScan.java:45)
at org.elasticwarehouse.tasks.ElasticWarehouseTaskScan$1.run(ElasticWarehouseTaskScan.java:111)

Solution 1:
Please check read access to provided location

Solution 2:
edit <jre location>/lib/security/java.policy to allow web application access a folder outside its deployment directory by adding line:

permission java.io.FilePermission "/home/user/myfiles/-", "read";

Here /- means any files or sub-folders inside this folder.You may also consider enabling everything when investigating above issue:

grant {
permission java.security.AllPermission;
}

Deploy ElasticWarehouse instance to play as a master in your cluster

Sometimes the easiest way is to add ElasticWarehouse node to your ElasticSearch cluster instead of using plugin. Such node (configured as node.master=true, node.data=false – elasticsearch.yml) won’t store any data, it will be part of your ElasticSearch cluster and it will play a role of ElasticWarehouse API node in your ElasticSearch cluster.


  • -

ElasticWarehouse version 1.2.0/1.2.1

Category : older versions

Release note

  • version 1.2.1
    • New attributes: last file access date, custom keywords, custom comments,
    • _ewtask
      • new task : action=rename
      • new search attributes : showrequest=true/false and allhosts=true/false
      • fixed error codes list
    • _ewinfo
      • possibility to get file info by it’s location (folder + filename) and not only by ID
      • returns information about corresponding tasks (if any)
      • interface to set custom keywords and custom comments
    • _ewget
      • returns additional header when file successfully downloaded (EwStatusFound: OK)
    • _ewsearch
      • possibility to combine _all, folder and GEO location fields in single request
    •  Fixes:
      • fix for remote mode
      • performance collector issues on windows
      • other
    • New version of ewshell included
    • Split binary content and indexed values between separate types
    • Limited number of supported underlying elasticsearch versions

ElasticWarehouse standalone packages (to work in embedded and remote modes)

ElasticWarehouse plugin packages (to be hosted as ElasticSearch plugin)

 


  • -

  • -

Introduction

Category : latest version

Goal of ElasticWarehouse is to organize your files, make them searchable and take care about fault tolerance. Thanks to ElasticWarehouse you can store terabytes of data in data cloud. In this guide you learn how to install and configure ElasticWarehouse cluster, how to import your files to the cluster and how to access them using simple or advanced API. For advanced usage is good to have understanding how ElasticSearch and Lucene work, because ElasticWarehouse has been build on the top of them.

ElasticWarehouse is an open-source project and it has nothing common with Elastic.co, except fact ElasticWarehouse has been build on the top of ElasticSearch.


  • -

Installation in 3 steps

Category : latest version

  1. Download latest standalone ElasticWarehouse package
  2. Extract (zip or tar.gz) the ElasticWarehouse official distribution to /opt/elasticwarehouse (C:\opt\elasticwarehouse on Windows) or different location, it’s up to you.
    cd /opt/elasticwarehouse
    tar -zxf elasticwarehouse-latest.tar.gz
    
  3. Launch elasticwarehouse.sh (ElasticWarehouse.bat on Windows)

Once you launch it, ElasticWarehouse will create a node client and create new or connect to existing cluster using multicast discovery.

Here is an output from successful run:

What’s next?

Check ElasticWarehouse status ….

curl -X GET http://localhost:10200/

Start more servers …


  • -