Solr Installation

Step 1: Install Solr on all Datanodes

root@m1 ~]# yum install lucidworks-hdpsearch

Loaded plugins: fastestmirror

Setting up Install Process

Loading mirror speeds from cached hostfile

* base: ftp.iitm.ac.in

* epel: ftp.jaist.ac.jp

* extras: ftp.iitm.ac.in

* updates: ftp.iitm.ac.in

epel/primary_db                                                                                                                                                                   | 5.8 MB     00:04

extras                                                                                                                                                                            | 3.4 kB     00:00

extras/primary_db                                                                                                                                                                | 37 kB     00:00

updates                                                                                                                                                                            | 3.4 kB     00:00

updates/primary_db                                                                                                                                                                 | 1.4 MB     00:04

Resolving Dependencies

–> Running transaction check

—> Package lucidworks-hdpsearch.noarch 0:2.3-4 will be installed

–> Finished Dependency Resolution

 

Dependencies Resolved

 

==========================================================================================================================================================================================================

Package                                                 Arch                                     Version                                  Repository                                             Size

==========================================================================================================================================================================================================

Installing:

lucidworks-hdpsearch                                   noarch                                   2.3-4                                     HDP-UTILS-1.1.0.20                                   681 M

 

Transaction Summary

==========================================================================================================================================================================================================

Install       1 Package(s)

 

Total download size: 681 M

Installed size: 791 M

Is this ok [y/N]: y

Downloading Packages:

lucidworks-hdpsearch-2.3-4.noarch.rpm                                                                                                                                             | 681 MB     69:48

Running rpm_check_debug

Running Transaction Test

Transaction Test Succeeded

Running Transaction

Installing : lucidworks-hdpsearch-2.3-4.noarch                                                                                                                                                     1/1

Executing pre-install script

Distribution found: RedHat

Checking for available disk space …

Available space: 71% KB

Minimun space: 2907510 KB

/var/tmp/rpm-tmp.6Te05z: line 82: [: 71%: integer expression expected

Minimum required disk space available

Verifying installation directories…

Creating installation directory

Installation directory created: /opt/lucidworks-hdpsearch

 

Validating user …

Group solr already exists

User solr already exists

 

Checking java …

Java 1.7 found

 

Executing post-install script

 

Creating symbolic link for Solr …

Created symbolic link

 

====

Package lucidworks-hdpsearch was installed

====

Verifying : lucidworks-hdpsearch-2.3-4.noarch                                                                                                                                                      1/1

 

Installed:

lucidworks-hdpsearch.noarch 0:2.3-4

 

Complete!

 

Step 2: Create soft link for solr logs:

ln -s /opt/lucidworks-hdpsearch/solr/server/logs /var/log/solr

 Step 3: Configure Solr Cloud

Since all Solr data will be stored in the Hadoop File system, it is important to adjust the time Solr will take to shutdown or “kill” the Solr process (whenever you execute “service solr stop/restart”). If this setting is not adjusted, Solr will try to shutdown the Solr process and because it takes a bit more time when using HDFS, Solr will simply kill the process and most of the time lock the Solr Indexes of your collections. If the index of a collection is locked the following exception is shown after the startup routine “org.apache.solr.common.SolrException: Index locked for write”

 Increase the sleep time from 5 to 30 seconds in /opt/lucidworks-hdpsearch/solr/bin/solr

sed -i ‘s/(sleep 5)/(sleep 30)/g’ /opt/lucidworks-hdpsearch/solr/bin/solr

Adjust Solr configuration: /opt/lucidworks-hdpsearch/solr/bin/solr.in.sh

SOLR_HEAP=”1024m”

SOLR_HOST=`hostname -f`

ZK_HOST=”m1.hdp22:2181,m2.hdp22:2181,w1.hdp22:2181/solr”

Make sure the file is owned by solr

chown solr:solr /opt/lucidworks-hdpsearch/

 

Step 4: Create a HDFS directory for Solr. This directory will be used for all the Solr data (indexes, etc.).

hdfs dfs -mkdir /apps/solr

hdfs dfs -chown solr /apps/solr

hdfs dfs -chmod 750 /apps/solr

 

Step 5: Zookeeper: SolrCloud is using Zookeeper to store configurations and cluster states. It’s recommended to create a separate ZNode for Solr. The following commands can be executed on one of the Solr nodes.

Initialize Zookeeper Znode for Solr:

/opt/lucidworks-hdpsearch/solr/server/scripts/cloud-scripts/zkcli.sh -zkhost m1.hdp22:2181,m2.hdp22:2181,w1.hdp22:2181 -cmd makepath /solr

 

Step 6: Adjust solrconfig.xml (/opt/lucidworks hdpsearch/solr_collections/films/conf)

1) Remove any existing directoryFactory-element

2) Add new Directory Factory for HDFS (make sure to modify the values for solr.hdfs.home)

<directoryFactory name=”DirectoryFactory” class=”solr.HdfsDirectoryFactory”>

<str name=”solr.hdfs.home”>hdfs://m1.hdp22:8020/user/solr</str>

<str name=”solr.hdfs.confdir”>/etc/hadoop/conf</str>

<bool name=”solr.hdfs.blockcache.enabled”>true</bool>

<int name=”solr.hdfs.blockcache.slab.count”>1</int>

<bool name=”solr.hdfs.blockcache.direct.memory.allocation”>false</bool>

<int name=”solr.hdfs.blockcache.blocksperbank”>16384</int>

<bool name=”solr.hdfs.blockcache.read.enabled”>true</bool>

<bool name=”solr.hdfs.nrtcachingdirectory.enable”>true</bool>

<int name=”solr.hdfs.nrtcachingdirectory.maxmergesizemb”>16</int>

<int name=”solr.hdfs.nrtcachingdirectory.maxcachedmb”>192</int>

</directoryFactory>

 

Adjust Lock-type

Search the lockType-element and change it to “hdfs”

<lockType>hdfs</lockType>

Now push config to zookeeper :

/opt/lucidworks-hdpsearch/solr/server/scripts/cloud-scripts/zkcli.sh -zkhost lxhdpmastinf001.lowes.com:2181,lxhdpmastinf002.lowes.com:2181,lxhdpwrkinf001.lowes.com:2181/solr -cmd upconfig -confname labs -confdir /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs_hdfs/conf

 

Step 7: Start solr on all the installed nodes.

[solr@m1 solr]$ bin/solr start -c -z m1.hdp22:2181,m2.hdp22:2181,w1.hdp22:2181 -Dsolr.directoryFactory=HdfsDirectoryFactory -Dsolr.lock.type=hdfs

You will see following message for successful start.

Started Solr server on port 8983 (pid=8744). Happy searching!

 

Step 8: Check solr service status on the all the nodes.

 [solr@m1 solr]$ bin/solr status

Found 1 Solr nodes:

Solr process 8744 running on port 8983

{

“solr_home”:”/opt/lucidworks-hdpsearch/solr/server/solr/”,

“version”:”5.2.1 1684708 – shalin – 2015-06-10 23:20:13″,

“startTime”:”2016-07-19T14:29:17.09Z”,

“uptime”:”0 days, 0 hours, 0 minutes, 6 seconds”,

“memory”:”83 MB (%16.9) of 490.7 MB”,

“cloud”:{

“ZooKeeper”:”m1.hdp22:2181,m2.hdp22:2181,w1.hdp22:2181″,

“liveNodes”:”3″,

“collections”:”0″}}

 

Step 9: Now you can test your solr cluster by creating some sample collection and shards.

[solr@m1 solr]$ /opt/lucidworks-hdpsearch/solr/bin/solr create -c test -d /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs_hdfs/conf -n test -s 2 -rf 2

Connecting to ZooKeeper at m1.hdp22:2181,m2.hdp22:2181,w1.hdp22:2181

Uploading /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs_hdfs/conf for config test to ZooKeeper at m1.hdp22:2181,m2.hdp22:2181,w1.hdp22:2181

 

Creating new collection ‘test’ using command:

http://192.168.56.41:8983/solr/admin/collections?action=CREATE&name=test&numShards=2&replicationFactor=2&maxShardsPerNode=2&collection.configName=test

{

“responseHeader”:{

“status”:0,

“QTime”:14404},

“success”:{“”:{

“responseHeader”:{

“status”:0,

“QTime”:13252},

“core”:”test_shard1_replica1″}}}

 

Now you can visit to solr page and can see cores have been created:

solr1

Step 10. In this step you need to upload some files to hdfs if that file is not there.

[root@m2 solr]# hadoop fs -put /opt/lucidworks-hdpsearch/solr/example/exampledocs/books.csv /csv/

Step 11: Now you can create index on top of this data file in hdfs.

root@m2 solr]# hadoop jar /opt/lucidworks-hdpsearch/job/lucidworks-hadoop-job-2.0.3.jar com.lucidworks.hadoop.ingest.IngestJob -DcsvFieldMapping=0=id,1=cat,2=name,3=price,4=instock,5=author -DcsvFirstLineComment -DidField=id -DcsvDelimiter=”,” -Dlww.commit.on.close=true -cls com.lucidworks.hadoop.ingest.CSVIngestMapper -c test -i csv/* -of com.lucidworks.hadoop.io.LWMapRedOutputFormat -zk m1.hdp22:2181,m2.hdp22:2181,w1.hdp22:2181/solr

 

or

[solr@m1 solr]$ hadoop jar /opt/lucidworks hdpsearch/job/lucidworks-hadoop-job-2.0.3.jar com.lucidworks.hadoop.ingest.IngestJob -Dmapreduce.job.queuename=ado -DcsvFieldMapping=0=id,1=cat,2=name,3=price,4=instock,5=author -DcsvFirstLineComment -DidField=id -DcsvDelimiter=”,” -Dlww.commit.on.close=true -cls com.lucidworks.hadoop.ingest.CSVIngestMapper -c sampletest -i csv/* -of com.lucidworks.hadoop.io.LWMapRedOutputFormat -zk m1.hdp22:2181,m2.hdp22:2181,w1.hdp22:2181/solr

 

Create index on specific location:

hadoop jar /opt/lucidworks-hdpsearch/job/lucidworks-hadoop-job-2.0.3.jar com.lucidworks.hadoop.ingest.IngestJob -Dmapreduce.job.queuename=ado -DcsvFieldMapping=0=id,1=cat,2=name,3=price,4=instock,5=author -DcsvFirstLineComment -DidField=id -DcsvDelimiter=”,” -Dlww.commit.on.close=true -cls com.lucidworks.hadoop.ingest.CSVIngestMapper -c test -i /csv/books2.csv -of com.lucidworks.hadoop.io.LWMapRedOutputFormat -zk m1.hdp22:2181,m2.hdp22:2181,w1.hdp22:2181/solr

 

You can verify it by running some Solr query via UI and command line.

[root@m2 solr]# curl “http://localhost:8983/solr/test_shard2_replica1/select?wt=json&indent=true&q=foundation&fl=id,name,price”

{

“responseHeader”:{

“status”:0,

“QTime”:127,

“params”:{

“fl”:”id,name,price”,

“indent”:”true”,

“q”:”foundation”,

“wt”:”json”}},

“response”:{“numFound”:1,”start”:0,”maxScore”:0.6775111,”docs”:[

{

“id”:”0553293354″,

“price”:[7.99],

“name”:[“Foundation”]}]

}}

 


1 Comment

Satish Bhonsle

December 3, 2017 at 3:57 pm

Nicely documented!

Leave a Reply to Satish Bhonsle Cancel reply