Solr Installation

Step 1: Install Solr on all Datanodes

root@m1 ~]# yum install lucidworks-hdpsearch

Loaded plugins: fastestmirror

Setting up Install Process

Loading mirror speeds from cached hostfile

* base:

* epel:

* extras:

* updates:

epel/primary_db                                                                                                                                                                   | 5.8 MB     00:04

extras                                                                                                                                                                            | 3.4 kB     00:00

extras/primary_db                                                                                                                                                                | 37 kB     00:00

updates                                                                                                                                                                            | 3.4 kB     00:00

updates/primary_db                                                                                                                                                                 | 1.4 MB     00:04

Resolving Dependencies

–> Running transaction check

—> Package lucidworks-hdpsearch.noarch 0:2.3-4 will be installed

–> Finished Dependency Resolution


Dependencies Resolved



Package                                                 Arch                                     Version                                  Repository                                             Size



lucidworks-hdpsearch                                   noarch                                   2.3-4                                     HDP-UTILS-                                   681 M


Transaction Summary


Install       1 Package(s)


Total download size: 681 M

Installed size: 791 M

Is this ok [y/N]: y

Downloading Packages:

lucidworks-hdpsearch-2.3-4.noarch.rpm                                                                                                                                             | 681 MB     69:48

Running rpm_check_debug

Running Transaction Test

Transaction Test Succeeded

Running Transaction

Installing : lucidworks-hdpsearch-2.3-4.noarch                                                                                                                                                     1/1

Executing pre-install script

Distribution found: RedHat

Checking for available disk space …

Available space: 71% KB

Minimun space: 2907510 KB

/var/tmp/rpm-tmp.6Te05z: line 82: [: 71%: integer expression expected

Minimum required disk space available

Verifying installation directories…

Creating installation directory

Installation directory created: /opt/lucidworks-hdpsearch


Validating user …

Group solr already exists

User solr already exists


Checking java …

Java 1.7 found


Executing post-install script


Creating symbolic link for Solr …

Created symbolic link



Package lucidworks-hdpsearch was installed


Verifying : lucidworks-hdpsearch-2.3-4.noarch                                                                                                                                                      1/1



lucidworks-hdpsearch.noarch 0:2.3-4




Step 2: Create soft link for solr logs:

ln -s /opt/lucidworks-hdpsearch/solr/server/logs /var/log/solr

 Step 3: Configure Solr Cloud

Since all Solr data will be stored in the Hadoop File system, it is important to adjust the time Solr will take to shutdown or “kill” the Solr process (whenever you execute “service solr stop/restart”). If this setting is not adjusted, Solr will try to shutdown the Solr process and because it takes a bit more time when using HDFS, Solr will simply kill the process and most of the time lock the Solr Indexes of your collections. If the index of a collection is locked the following exception is shown after the startup routine “org.apache.solr.common.SolrException: Index locked for write”

 Increase the sleep time from 5 to 30 seconds in /opt/lucidworks-hdpsearch/solr/bin/solr

sed -i ‘s/(sleep 5)/(sleep 30)/g’ /opt/lucidworks-hdpsearch/solr/bin/solr

Adjust Solr configuration: /opt/lucidworks-hdpsearch/solr/bin/


SOLR_HOST=`hostname -f`


Make sure the file is owned by solr

chown solr:solr /opt/lucidworks-hdpsearch/


Step 4: Create a HDFS directory for Solr. This directory will be used for all the Solr data (indexes, etc.).

hdfs dfs -mkdir /apps/solr

hdfs dfs -chown solr /apps/solr

hdfs dfs -chmod 750 /apps/solr


Step 5: Zookeeper: SolrCloud is using Zookeeper to store configurations and cluster states. It’s recommended to create a separate ZNode for Solr. The following commands can be executed on one of the Solr nodes.

Initialize Zookeeper Znode for Solr:

/opt/lucidworks-hdpsearch/solr/server/scripts/cloud-scripts/ -zkhost m1.hdp22:2181,m2.hdp22:2181,w1.hdp22:2181 -cmd makepath /solr


Step 6: Adjust solrconfig.xml (/opt/lucidworks hdpsearch/solr_collections/films/conf)

1) Remove any existing directoryFactory-element

2) Add new Directory Factory for HDFS (make sure to modify the values for solr.hdfs.home)

<directoryFactory name=”DirectoryFactory” class=”solr.HdfsDirectoryFactory”>

<str name=”solr.hdfs.home”>hdfs://m1.hdp22:8020/user/solr</str>

<str name=”solr.hdfs.confdir”>/etc/hadoop/conf</str>

<bool name=”solr.hdfs.blockcache.enabled”>true</bool>

<int name=”solr.hdfs.blockcache.slab.count”>1</int>

<bool name=””>false</bool>

<int name=”solr.hdfs.blockcache.blocksperbank”>16384</int>

<bool name=””>true</bool>

<bool name=”solr.hdfs.nrtcachingdirectory.enable”>true</bool>

<int name=”solr.hdfs.nrtcachingdirectory.maxmergesizemb”>16</int>

<int name=”solr.hdfs.nrtcachingdirectory.maxcachedmb”>192</int>



Adjust Lock-type

Search the lockType-element and change it to “hdfs”


Now push config to zookeeper :

/opt/lucidworks-hdpsearch/solr/server/scripts/cloud-scripts/ -zkhost,, -cmd upconfig -confname labs -confdir /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs_hdfs/conf


Step 7: Start solr on all the installed nodes.

[solr@m1 solr]$ bin/solr start -c -z m1.hdp22:2181,m2.hdp22:2181,w1.hdp22:2181 -Dsolr.directoryFactory=HdfsDirectoryFactory -Dsolr.lock.type=hdfs

You will see following message for successful start.

Started Solr server on port 8983 (pid=8744). Happy searching!


Step 8: Check solr service status on the all the nodes.

 [solr@m1 solr]$ bin/solr status

Found 1 Solr nodes:

Solr process 8744 running on port 8983



“version”:”5.2.1 1684708 – shalin – 2015-06-10 23:20:13″,


“uptime”:”0 days, 0 hours, 0 minutes, 6 seconds”,

“memory”:”83 MB (%16.9) of 490.7 MB”,






Step 9: Now you can test your solr cluster by creating some sample collection and shards.

[solr@m1 solr]$ /opt/lucidworks-hdpsearch/solr/bin/solr create -c test -d /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs_hdfs/conf -n test -s 2 -rf 2

Connecting to ZooKeeper at m1.hdp22:2181,m2.hdp22:2181,w1.hdp22:2181

Uploading /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs_hdfs/conf for config test to ZooKeeper at m1.hdp22:2181,m2.hdp22:2181,w1.hdp22:2181


Creating new collection ‘test’ using command:











Now you can visit to solr page and can see cores have been created:


Step 10. In this step you need to upload some files to hdfs if that file is not there.

[root@m2 solr]# hadoop fs -put /opt/lucidworks-hdpsearch/solr/example/exampledocs/books.csv /csv/

Step 11: Now you can create index on top of this data file in hdfs.

root@m2 solr]# hadoop jar /opt/lucidworks-hdpsearch/job/lucidworks-hadoop-job-2.0.3.jar com.lucidworks.hadoop.ingest.IngestJob -DcsvFieldMapping=0=id,1=cat,2=name,3=price,4=instock,5=author -DcsvFirstLineComment -DidField=id -DcsvDelimiter=”,” -Dlww.commit.on.close=true -cls com.lucidworks.hadoop.ingest.CSVIngestMapper -c test -i csv/* -of -zk m1.hdp22:2181,m2.hdp22:2181,w1.hdp22:2181/solr



[solr@m1 solr]$ hadoop jar /opt/lucidworks hdpsearch/job/lucidworks-hadoop-job-2.0.3.jar com.lucidworks.hadoop.ingest.IngestJob -Dmapreduce.job.queuename=ado -DcsvFieldMapping=0=id,1=cat,2=name,3=price,4=instock,5=author -DcsvFirstLineComment -DidField=id -DcsvDelimiter=”,” -Dlww.commit.on.close=true -cls com.lucidworks.hadoop.ingest.CSVIngestMapper -c sampletest -i csv/* -of -zk m1.hdp22:2181,m2.hdp22:2181,w1.hdp22:2181/solr


Create index on specific location:

hadoop jar /opt/lucidworks-hdpsearch/job/lucidworks-hadoop-job-2.0.3.jar com.lucidworks.hadoop.ingest.IngestJob -Dmapreduce.job.queuename=ado -DcsvFieldMapping=0=id,1=cat,2=name,3=price,4=instock,5=author -DcsvFirstLineComment -DidField=id -DcsvDelimiter=”,” -Dlww.commit.on.close=true -cls com.lucidworks.hadoop.ingest.CSVIngestMapper -c test -i /csv/books2.csv -of -zk m1.hdp22:2181,m2.hdp22:2181,w1.hdp22:2181/solr


You can verify it by running some Solr query via UI and command line.

[root@m2 solr]# curl “http://localhost:8983/solr/test_shard2_replica1/select?wt=json&indent=true&q=foundation&fl=id,name,price”

















1 Comment

Satish Bhonsle

December 3, 2017 at 3:57 pm

Nicely documented!

Leave a Reply