Step 1: Install Solr on all Datanodes
root@m1 ~]# yum install lucidworks-hdpsearch
Loaded plugins: fastestmirror
Setting up Install Process
Loading mirror speeds from cached hostfile
* base: ftp.iitm.ac.in
* epel: ftp.jaist.ac.jp
* extras: ftp.iitm.ac.in
* updates: ftp.iitm.ac.in
epel/primary_db | 5.8 MB 00:04
extras | 3.4 kB 00:00
extras/primary_db | 37 kB 00:00
updates | 3.4 kB 00:00
updates/primary_db | 1.4 MB 00:04
Resolving Dependencies
–> Running transaction check
—> Package lucidworks-hdpsearch.noarch 0:2.3-4 will be installed
–> Finished Dependency Resolution
Dependencies Resolved
==========================================================================================================================================================================================================
Package Arch Version Repository Size
==========================================================================================================================================================================================================
Installing:
lucidworks-hdpsearch noarch 2.3-4 HDP-UTILS-1.1.0.20 681 M
Transaction Summary
==========================================================================================================================================================================================================
Install 1 Package(s)
Total download size: 681 M
Installed size: 791 M
Is this ok [y/N]: y
Downloading Packages:
lucidworks-hdpsearch-2.3-4.noarch.rpm | 681 MB 69:48
Running rpm_check_debug
Running Transaction Test
Transaction Test Succeeded
Running Transaction
Installing : lucidworks-hdpsearch-2.3-4.noarch 1/1
Executing pre-install script
Distribution found: RedHat
Checking for available disk space …
Available space: 71% KB
Minimun space: 2907510 KB
/var/tmp/rpm-tmp.6Te05z: line 82: [: 71%: integer expression expected
Minimum required disk space available
Verifying installation directories…
Creating installation directory
Installation directory created: /opt/lucidworks-hdpsearch
Validating user …
Group solr already exists
User solr already exists
Checking java …
Java 1.7 found
Executing post-install script
Creating symbolic link for Solr …
Created symbolic link
====
Package lucidworks-hdpsearch was installed
====
Verifying : lucidworks-hdpsearch-2.3-4.noarch 1/1
Installed:
lucidworks-hdpsearch.noarch 0:2.3-4
Complete!
Step 2: Create soft link for solr logs:
ln -s /opt/lucidworks-hdpsearch/solr/server/logs /var/log/solr
Step 3: Configure Solr Cloud
Since all Solr data will be stored in the Hadoop File system, it is important to adjust the time Solr will take to shutdown or “kill” the Solr process (whenever you execute “service solr stop/restart”). If this setting is not adjusted, Solr will try to shutdown the Solr process and because it takes a bit more time when using HDFS, Solr will simply kill the process and most of the time lock the Solr Indexes of your collections. If the index of a collection is locked the following exception is shown after the startup routine “org.apache.solr.common.SolrException: Index locked for write”
Increase the sleep time from 5 to 30 seconds in /opt/lucidworks-hdpsearch/solr/bin/solr
sed -i ‘s/(sleep 5)/(sleep 30)/g’ /opt/lucidworks-hdpsearch/solr/bin/solr
Adjust Solr configuration: /opt/lucidworks-hdpsearch/solr/bin/solr.in.sh
SOLR_HEAP=”1024m”
SOLR_HOST=`hostname -f`
ZK_HOST=”m1.hdp22:2181,m2.hdp22:2181,w1.hdp22:2181/solr”
Make sure the file is owned by solr
chown solr:solr /opt/lucidworks-hdpsearch/
Step 4: Create a HDFS directory for Solr. This directory will be used for all the Solr data (indexes, etc.).
hdfs dfs -mkdir /apps/solr
hdfs dfs -chown solr /apps/solr
hdfs dfs -chmod 750 /apps/solr
Step 5: Zookeeper: SolrCloud is using Zookeeper to store configurations and cluster states. It’s recommended to create a separate ZNode for Solr. The following commands can be executed on one of the Solr nodes.
Initialize Zookeeper Znode for Solr:
/opt/lucidworks-hdpsearch/solr/server/scripts/cloud-scripts/zkcli.sh -zkhost m1.hdp22:2181,m2.hdp22:2181,w1.hdp22:2181 -cmd makepath /solr
Step 6: Adjust solrconfig.xml (/opt/lucidworks hdpsearch/solr_collections/films/conf)
1) Remove any existing directoryFactory-element
2) Add new Directory Factory for HDFS (make sure to modify the values for solr.hdfs.home)
<directoryFactory name=”DirectoryFactory” class=”solr.HdfsDirectoryFactory”>
<str name=”solr.hdfs.home”>hdfs://m1.hdp22:8020/user/solr</str>
<str name=”solr.hdfs.confdir”>/etc/hadoop/conf</str>
<bool name=”solr.hdfs.blockcache.enabled”>true</bool>
<int name=”solr.hdfs.blockcache.slab.count”>1</int>
<bool name=”solr.hdfs.blockcache.direct.memory.allocation”>false</bool>
<int name=”solr.hdfs.blockcache.blocksperbank”>16384</int>
<bool name=”solr.hdfs.blockcache.read.enabled”>true</bool>
<bool name=”solr.hdfs.nrtcachingdirectory.enable”>true</bool>
<int name=”solr.hdfs.nrtcachingdirectory.maxmergesizemb”>16</int>
<int name=”solr.hdfs.nrtcachingdirectory.maxcachedmb”>192</int>
</directoryFactory>
Adjust Lock-type
Search the lockType-element and change it to “hdfs”
<lockType>hdfs</lockType>
Now push config to zookeeper :
/opt/lucidworks-hdpsearch/solr/server/scripts/cloud-scripts/zkcli.sh -zkhost lxhdpmastinf001.lowes.com:2181,lxhdpmastinf002.lowes.com:2181,lxhdpwrkinf001.lowes.com:2181/solr -cmd upconfig -confname labs -confdir /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs_hdfs/conf
Step 7: Start solr on all the installed nodes.
[solr@m1 solr]$ bin/solr start -c -z m1.hdp22:2181,m2.hdp22:2181,w1.hdp22:2181 -Dsolr.directoryFactory=HdfsDirectoryFactory -Dsolr.lock.type=hdfs
You will see following message for successful start.
Started Solr server on port 8983 (pid=8744). Happy searching!
Step 8: Check solr service status on the all the nodes.
[solr@m1 solr]$ bin/solr status
Found 1 Solr nodes:
Solr process 8744 running on port 8983
{
“solr_home”:”/opt/lucidworks-hdpsearch/solr/server/solr/”,
“version”:”5.2.1 1684708 – shalin – 2015-06-10 23:20:13″,
“startTime”:”2016-07-19T14:29:17.09Z”,
“uptime”:”0 days, 0 hours, 0 minutes, 6 seconds”,
“memory”:”83 MB (%16.9) of 490.7 MB”,
“cloud”:{
“ZooKeeper”:”m1.hdp22:2181,m2.hdp22:2181,w1.hdp22:2181″,
“liveNodes”:”3″,
“collections”:”0″}}
Step 9: Now you can test your solr cluster by creating some sample collection and shards.
[solr@m1 solr]$ /opt/lucidworks-hdpsearch/solr/bin/solr create -c test -d /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs_hdfs/conf -n test -s 2 -rf 2
Connecting to ZooKeeper at m1.hdp22:2181,m2.hdp22:2181,w1.hdp22:2181
Uploading /opt/lucidworks-hdpsearch/solr/server/solr/configsets/data_driven_schema_configs_hdfs/conf for config test to ZooKeeper at m1.hdp22:2181,m2.hdp22:2181,w1.hdp22:2181
Creating new collection ‘test’ using command:
http://192.168.56.41:8983/solr/admin/collections?action=CREATE&name=test&numShards=2&replicationFactor=2&maxShardsPerNode=2&collection.configName=test
{
“responseHeader”:{
“status”:0,
“QTime”:14404},
“success”:{“”:{
“responseHeader”:{
“status”:0,
“QTime”:13252},
“core”:”test_shard1_replica1″}}}
Now you can visit to solr page and can see cores have been created:
Step 10. In this step you need to upload some files to hdfs if that file is not there.
[root@m2 solr]# hadoop fs -put /opt/lucidworks-hdpsearch/solr/example/exampledocs/books.csv /csv/
Step 11: Now you can create index on top of this data file in hdfs.
root@m2 solr]# hadoop jar /opt/lucidworks-hdpsearch/job/lucidworks-hadoop-job-2.0.3.jar com.lucidworks.hadoop.ingest.IngestJob -DcsvFieldMapping=0=id,1=cat,2=name,3=price,4=instock,5=author -DcsvFirstLineComment -DidField=id -DcsvDelimiter=”,” -Dlww.commit.on.close=true -cls com.lucidworks.hadoop.ingest.CSVIngestMapper -c test -i csv/* -of com.lucidworks.hadoop.io.LWMapRedOutputFormat -zk m1.hdp22:2181,m2.hdp22:2181,w1.hdp22:2181/solr
or
[solr@m1 solr]$ hadoop jar /opt/lucidworks hdpsearch/job/lucidworks-hadoop-job-2.0.3.jar com.lucidworks.hadoop.ingest.IngestJob -Dmapreduce.job.queuename=ado -DcsvFieldMapping=0=id,1=cat,2=name,3=price,4=instock,5=author -DcsvFirstLineComment -DidField=id -DcsvDelimiter=”,” -Dlww.commit.on.close=true -cls com.lucidworks.hadoop.ingest.CSVIngestMapper -c sampletest -i csv/* -of com.lucidworks.hadoop.io.LWMapRedOutputFormat -zk m1.hdp22:2181,m2.hdp22:2181,w1.hdp22:2181/solr
Create index on specific location:
hadoop jar /opt/lucidworks-hdpsearch/job/lucidworks-hadoop-job-2.0.3.jar com.lucidworks.hadoop.ingest.IngestJob -Dmapreduce.job.queuename=ado -DcsvFieldMapping=0=id,1=cat,2=name,3=price,4=instock,5=author -DcsvFirstLineComment -DidField=id -DcsvDelimiter=”,” -Dlww.commit.on.close=true -cls com.lucidworks.hadoop.ingest.CSVIngestMapper -c test -i /csv/books2.csv -of com.lucidworks.hadoop.io.LWMapRedOutputFormat -zk m1.hdp22:2181,m2.hdp22:2181,w1.hdp22:2181/solr
You can verify it by running some Solr query via UI and command line.
[root@m2 solr]# curl “http://localhost:8983/solr/test_shard2_replica1/select?wt=json&indent=true&q=foundation&fl=id,name,price”
{
“responseHeader”:{
“status”:0,
“QTime”:127,
“params”:{
“fl”:”id,name,price”,
“indent”:”true”,
“q”:”foundation”,
“wt”:”json”}},
“response”:{“numFound”:1,”start”:0,”maxScore”:0.6775111,”docs”:[
{
“id”:”0553293354″,
“price”:[7.99],
“name”:[“Foundation”]}]
}}
1 Comment
Satish Bhonsle
December 3, 2017 at 3:57 pmNicely documented!