Published 2021-02-11.
Last modified 2023-12-09.
Time to read: 3 minutes.
django
collection.
This article originally included information about implementing fulltext search in django-oscar
.
That information has been hived to this article.
The Solr Reference Guide can be viewed online.
Linux Filesystem Hierarchy Standard
The Linux Foundation released v3.0 of the Filesystem Hierarchy Standard (FHS) in July 2011.
This is important for application developers (so that they know to create temporary files in
/tmp/
rather than in the user’s home directory, for instance),
but it is also important for system administrators.
Not only does FHS specify where the directories go, but it also specifies important properties like which directories must be mounted read-only (critical for security).
Solr ignores the FHS, resulting in reduced security.
Installing Solr
The instructions for Installing Solr are well-written but suboptimal. The Solr documentation is fairly good, but it could still use improvement.
Solr requires a Java runtime environment (JRE) and does not need the entire JDK:
$ yes | sudo apt install default-jre
I downloaded the large zipped tar file called solr-8.8.0.tgz
like this:
$ cd ~/Downloads/ /home/mslinn
$ wget -qO solr-8.8.0.tgz \ https://archive.apache.org/dist/lucene/solr/8.8.0/solr-8.8.0.tgz
The installation script is provided within the zipped tar file as solr-8.8.0/bin/install_solr_service.sh
.
When I looked at the source code for the installation script, I felt that the official installation instructions were suboptimal.
By default, the installer extracts the contents of the zipped tar file into /opt/solr/
.
There is no need to unzip the entire tar file just so the installation script can run.
Just the one file can be extracted into the current directory with this incantation:
$ tar xvf solr-8.8.0.tgz -C ./ --strip-components=2 \
solr-8.8.0/bin/install_solr_service.sh
The installer’s help information is useful:
$ ./install_solr_service.sh Usage: install_solr_service.sh <path_to_solr_distribution_archive> [OPTIONS]
The first argument to the script must be a path to a Solr distribution archive, such as solr-5.0.0.tgz (only .tgz or .zip are supported formats for the archive)
Supported OPTIONS include:
-d Directory for live / writable Solr files, such as logs, pid files, and index data; defaults to /var/solr
-i Directory to extract the Solr installation archive; defaults to /opt/ The specified path must exist prior to using this script.
-p Port Solr should bind to; default is 8983
-s Service name; defaults to solr
-u User to own the Solr files and run the Solr process as; defaults to solr This script will create the specified user account if it does not exist.
-f Upgrade Solr. Overwrite symlink and init script of previous installation.
-n Do not start Solr service after install, and do not abort on missing Java
NOTE: Must be run as the root user
The installation using default arguments was painless:
$ sudo ./install_solr_service.sh solr-8.8.0.tgz id: 'solr': no such user Creating new user: solr Adding system user 'solr' (UID 132) ... Adding new group 'solr' (GID 139) ... Adding new user 'solr' (UID 132) with group 'solr' ... Creating home directory '/var/solr' ...
Extracting solr-8.8.0.tgz to /opt
Installing symlink /opt/solr -> /opt/solr-8.8.0 ...
Installing /etc/init.d/solr script ...
Installing /etc/default/solr.in.sh ...
Service solr installed. Customize Solr startup configuration in /etc/default/solr.in.sh Waiting up to 180 seconds to see Solr running on port 8983 Started Solr server on port 8983 (pid=9603). Happy searching!
$ rm ./install_solr_service.sh
The installer does not update the PATH
to point to the Solr installation directory,
which is /opt/solr
.
I defined a bash alias to make controlling Solr more convenient:
$ echo "alias solr=/opt/solr/bin/solr" >> ~/.bash_aliases
$ source ~/.bash_aliases
$ solr status
Found 1 Solr nodes:
Solr process 2818 running on port 8983 { "solr_home":"/opt/solr/server/solr", "version":"8.8.0 b10659f0fc18b58b90929cfdadde94544d202c4a - noble - 2021-01-25 19:12:52", "startTime":"2021-02-12T12:22:42.968Z", "uptime":"0 days, 0 hours, 2 minutes, 49 seconds", "memory":"38.9 MB (%7.6) of 512 MB"}
Solr Home
The installation program created a new user called solr
with home directory /var/solr
.
This location does not comply with the FHS.
/home
is a fairly standard concept, but it is clearly a site-specific filesystem.
The setup will differ from host to host.
Therefore, no program should assume any specific location for a home directory, rather it should query for it.
/var
contains variable data files.
This includes spool directories and files, administrative and logging data, and transient and temporary files.
This deficiency means that production installations are more likely to have permission-related security problems.
Process ID File
The running solr
process ID is saved in the Solr user home directory,
for example, /var/
.
/etc
, must be placed in /run
.
The naming convention for PID files is <program-name>.pid
.
For example, the crond
PID file is named /run/crond.pid
.
Again, this deficiency means that production installations are more likely to have permission-related security problems.
Logs
Solr log files are stored in the Solr user home directory, /var/
.
$ ls /var/solr/logs/ solr-8983-console.log solr.log solr_gc.log solr_slow_requests.log
As previously mentioned, the FHS states that log files should be placed in /var/
.
Apache Solr does not seem to allow that, at least according to the documentation I have seen so far.
Again, this deficiency means that production installations are more likely to have permission-related security problems.
Configuration
The placement of the Solr configuration file in /var/solr/data/solr.xml
again ignores FHS conventions for configuration file placement.
/etc
hierarchy contains configuration files.
A "configuration file" is a local file used to control the operation of a program; it must be static and cannot be an executable binary.
It is recommended that files be stored in subdirectories of
/etc
rather than directly in /etc
.
Again, this deficiency means that production installations are more likely to have permission-related security problems.
Here is the configuration file, with the copyright notice removed for the reader's convenience:
<?xml version="1.0" encoding="UTF-8" ?> <!-- This is an example of a simple "solr.xml" file for configuring one or more Solr Cores, as well as allowing Cores to be added, removed, and reloaded via HTTP requests.
More information about options available in this configuration file, and Solr Core administration can be found online: https://lucene.apache.org/solr/guide/format-of-solr-xml.html -->
<solr>
<int name="maxBooleanClauses">${solr.max.booleanClauses:1024}</int> <str name="sharedLib">${solr.sharedLib:}</str> <str name="allowPaths">${solr.allowPaths:}</str>
<solrcloud>
<str name="host">${host:}</str> <int name="hostPort">${solr.port.advertise:0}</int> <str name="hostContext">${hostContext:solr}</str>
<bool name="genericCoreNodeNames">${genericCoreNodeNames:true}</bool>
<int name="zkClientTimeout">${zkClientTimeout:30000}</int> <int name="distribUpdateSoTimeout">${distribUpdateSoTimeout:600000}</int> <int name="distribUpdateConnTimeout">${distribUpdateConnTimeout:60000}</int> <str name="zkCredentialsProvider">${zkCredentialsProvider:org.apache.solr.common.cloud.DefaultZkCredentialsProvider}</str> <str name="zkACLProvider">${zkACLProvider:org.apache.solr.common.cloud.DefaultZkACLProvider}</str>
</solrcloud>
<shardHandlerFactory name="shardHandlerFactory" class="HttpShardHandlerFactory"> <int name="socketTimeout">${socketTimeout:600000}</int> <int name="connTimeout">${connTimeout:60000}</int> <str name="shardsWhitelist">${solr.shardsWhitelist:}</str> </shardHandlerFactory>
<metrics enabled="${metricsEnabled:true}"/>
</solr>
Solr Administrative Console
The Solr admin console (at http://localhost:8983/solr/
) looks like this:
Conclusion
I do not recommend Apache Solr for most uses. This is due to the unnecessary security risks associated with requiring users to establish and verify secure file and directory permissions due to Apache Solr’s failure to adhere to the Linux Filesystem Hierarchy Standard.