Near Real Time (NRT) search means that documents are available for search almost immediately after being indexed. A commit operation makes index changes visible to new search requests.
There are two types of commit in Solr, namely:
- Soft commit: Allows users to see a very near real time view of the index. With NRT, a user can modify a
commit
command to be a soft commit, which avoids parts of a standard commit that can be costly. Soft commits are not permanent changes and are prone to data loss. You will still want to do standard commits to ensure that documents are in stable storage. For a soft commit,the tlog will NOT get truncated. It will continue to grow. - Hard commit: Uses the transaction log to get the id of the latest document changes, and also calls ‘fsync’ on the index files to ensure they have been flushed to stable storage and no data loss will result in case of Solr process crash or Hardware issues. Hard commits truncate the current segment and open a new segment in your index i.e. the tlog is truncated: A new tlog is started. Old tlogs will be deleted if there are more than 100 documents in newer tlogs.
Soft Commit is much faster since it only makes index changes visible and does not ‘fsync'
index files or write a new index descriptor. If the JVM crashes or there is a loss of power, changes that occurred after the last Hard Commit will be lost. Search collections that have NRT requirements will want to soft commit often, but hard commit less frequently. A Soft Commit may be “less expensive” in terms of time, but not free, since it can slow throughput.
An ‘Optimize’ is like a hard commit except that it forces all of the index segments to be merged into a single segment first. Depending on the use, this operation should be performed infrequently (e.g., nightly), if at all, since it involves reading and re-writing the entire index. Segments are normally merged over time anyway (as determined by the merge policy), and optimize just forces these merges to occur immediately.