Django and Oscar

Installing Apache Solr on Ubuntu 20.04

Published 2021-02-11. Last modified 2023-12-09.
Time to read: 3 minutes.

This page is part of the django collection.

This article originally included information about implementing fulltext search in django-oscar. That information has been hived to this article.

The Solr Reference Guide can be viewed online.

Installing Apache Solr on a Linux system exposes that system to unnecessary security risks

Linux Filesystem Hierarchy Standard

The Linux Foundation released v3.0 of the Filesystem Hierarchy Standard (FHS) in July 2011.

The FHS defines the basic structure of a Unix-like operating system — what the directories are, what types of files and data belong in each, and so on.

This is important for application developers (so that they know to create temporary files in /tmp/ rather than in the user’s home directory, for instance), but it is also important for system administrators.

Not only does FHS specify where the directories go, but it also specifies important properties like which directories must be mounted read-only (critical for security).

Solr ignores the FHS, resulting in reduced security.

Installing Solr

The instructions for Installing Solr are well-written but suboptimal. The Solr documentation is fairly good, but it could still use improvement.

Solr requires a Java runtime environment (JRE) and does not need the entire JDK:

Shell
$ yes | sudo apt install default-jre

I downloaded the large zipped tar file called solr-8.8.0.tgz like this:

Shell
$ cd ~/Downloads/
/home/mslinn 
$ wget -qO solr-8.8.0.tgz \ https://archive.apache.org/dist/lucene/solr/8.8.0/solr-8.8.0.tgz

The installation script is provided within the zipped tar file as solr-8.8.0/bin/install_solr_service.sh. When I looked at the source code for the installation script, I felt that the official installation instructions were suboptimal. By default, the installer extracts the contents of the zipped tar file into /opt/solr/. There is no need to unzip the entire tar file just so the installation script can run. Just the one file can be extracted into the current directory with this incantation:

Shell
$ tar xvf solr-8.8.0.tgz -C ./ --strip-components=2 \
  solr-8.8.0/bin/install_solr_service.sh

The installer’s help information is useful:

Shell
$ ./install_solr_service.sh
Usage: install_solr_service.sh <path_to_solr_distribution_archive> [OPTIONS]
The first argument to the script must be a path to a Solr distribution archive, such as solr-5.0.0.tgz (only .tgz or .zip are supported formats for the archive)
Supported OPTIONS include:
-d Directory for live / writable Solr files, such as logs, pid files, and index data; defaults to /var/solr
-i Directory to extract the Solr installation archive; defaults to /opt/ The specified path must exist prior to using this script.
-p Port Solr should bind to; default is 8983
-s Service name; defaults to solr
-u User to own the Solr files and run the Solr process as; defaults to solr This script will create the specified user account if it does not exist.
-f Upgrade Solr. Overwrite symlink and init script of previous installation.
-n Do not start Solr service after install, and do not abort on missing Java
NOTE: Must be run as the root user

The installation using default arguments was painless:

Shell
$ sudo ./install_solr_service.sh solr-8.8.0.tgz
id: 'solr': no such user
Creating new user: solr
Adding system user 'solr' (UID 132) ...
Adding new group 'solr' (GID 139) ...
Adding new user 'solr' (UID 132) with group 'solr' ...
Creating home directory '/var/solr' ...
Extracting solr-8.8.0.tgz to /opt
Installing symlink /opt/solr -> /opt/solr-8.8.0 ...
Installing /etc/init.d/solr script ...
Installing /etc/default/solr.in.sh ...
Service solr installed. Customize Solr startup configuration in /etc/default/solr.in.sh Waiting up to 180 seconds to see Solr running on port 8983 Started Solr server on port 8983 (pid=9603). Happy searching!

$ rm ./install_solr_service.sh

The installer does not update the PATH to point to the Solr installation directory, which is /opt/solr. I defined a bash alias to make controlling Solr more convenient:

Shell
$ echo "alias solr=/opt/solr/bin/solr" >> ~/.bash_aliases
$ source ~/.bash_aliases
$ solr status
Found 1 Solr nodes:
Solr process 2818 running on port 8983 { "solr_home":"/opt/solr/server/solr", "version":"8.8.0 b10659f0fc18b58b90929cfdadde94544d202c4a - noble - 2021-01-25 19:12:52", "startTime":"2021-02-12T12:22:42.968Z", "uptime":"0 days, 0 hours, 2 minutes, 49 seconds", "memory":"38.9 MB (%7.6) of 512 MB"}

Solr Home

The installation program created a new user called solr with home directory /var/solr. This location does not comply with the FHS.

/home is a fairly standard concept, but it is clearly a site-specific filesystem. The setup will differ from host to host.

Therefore, no program should assume any specific location for a home directory, rather it should query for it.
/var contains variable data files. This includes spool directories and files, administrative and logging data, and transient and temporary files.

This deficiency means that production installations are more likely to have permission-related security problems.

Process ID File

The running solr process ID is saved in the Solr user home directory, for example, /var/solr/solr-8983.pid.

Process identifier (PID) files, which were originally placed in /etc, must be placed in /run. The naming convention for PID files is <program-name>.pid. For example, the crond PID file is named /run/crond.pid.

Again, this deficiency means that production installations are more likely to have permission-related security problems.

Logs

Solr log files are stored in the Solr user home directory, /var/solr/logs.

Shell
$ ls /var/solr/logs/
solr-8983-console.log  solr.log  solr_gc.log  solr_slow_requests.log 

As previously mentioned, the FHS states that log files should be placed in /var/log. Apache Solr does not seem to allow that, at least according to the documentation I have seen so far.

Again, this deficiency means that production installations are more likely to have permission-related security problems.

Configuration

The placement of the Solr configuration file in /var/solr/data/solr.xml again ignores FHS conventions for configuration file placement.

The /etc hierarchy contains configuration files. A "configuration file" is a local file used to control the operation of a program; it must be static and cannot be an executable binary.

It is recommended that files be stored in subdirectories of /etc rather than directly in /etc.

Again, this deficiency means that production installations are more likely to have permission-related security problems.

Here is the configuration file, with the copyright notice removed for the reader's convenience:

/var/solr/data/solr.xml
<?xml version="1.0" encoding="UTF-8" ?>
<!--
   This is an example of a simple "solr.xml" file for configuring one or
   more Solr Cores, as well as allowing Cores to be added, removed, and
   reloaded via HTTP requests.
More information about options available in this configuration file, and Solr Core administration can be found online: https://lucene.apache.org/solr/guide/format-of-solr-xml.html -->
<solr>
<int name="maxBooleanClauses">${solr.max.booleanClauses:1024}</int> <str name="sharedLib">${solr.sharedLib:}</str> <str name="allowPaths">${solr.allowPaths:}</str>
<solrcloud>
<str name="host">${host:}</str> <int name="hostPort">${solr.port.advertise:0}</int> <str name="hostContext">${hostContext:solr}</str>
<bool name="genericCoreNodeNames">${genericCoreNodeNames:true}</bool>
<int name="zkClientTimeout">${zkClientTimeout:30000}</int> <int name="distribUpdateSoTimeout">${distribUpdateSoTimeout:600000}</int> <int name="distribUpdateConnTimeout">${distribUpdateConnTimeout:60000}</int> <str name="zkCredentialsProvider">${zkCredentialsProvider:org.apache.solr.common.cloud.DefaultZkCredentialsProvider}</str> <str name="zkACLProvider">${zkACLProvider:org.apache.solr.common.cloud.DefaultZkACLProvider}</str>
</solrcloud>
<shardHandlerFactory name="shardHandlerFactory" class="HttpShardHandlerFactory"> <int name="socketTimeout">${socketTimeout:600000}</int> <int name="connTimeout">${connTimeout:60000}</int> <str name="shardsWhitelist">${solr.shardsWhitelist:}</str> </shardHandlerFactory>
<metrics enabled="${metricsEnabled:true}"/>
</solr>

Solr Administrative Console

The Solr admin console (at http://localhost:8983/solr/) looks like this:

Conclusion

I do not recommend Apache Solr for most uses. This is due to the unnecessary security risks associated with requiring users to establish and verify secure file and directory permissions due to Apache Solr’s failure to adhere to the Linux Filesystem Hierarchy Standard.



* indicates a required field.

Please select the following to receive Mike Slinn’s newsletter:

You can unsubscribe at any time by clicking the link in the footer of emails.

Mike Slinn uses Mailchimp as his marketing platform. By clicking below to subscribe, you acknowledge that your information will be transferred to Mailchimp for processing. Learn more about Mailchimp’s privacy practices.