Mike Slinn
Mike Slinn

Installing Apache Solr on Ubuntu 20.04

Published 2021-02-11. Last modified 2021-02-12.
Time to read: about 3 minutes.

This article is categorized under Django-Oscar, Ubuntu.

This article originally included information about implementing fulltext search in django-oscar. That information has been hived to this post.

Installing Apache Solr on a Linux system exposes that system to unnecessary security risks

Linux Filesystem Hierarchy Standard

The Linux Foundation’s Filesystem Hierarchy Standard (FHS). In an article on Linux.com A Look at the Filesystem Hierarchy Standard 3.0, the author Nathan Willis writes:

Not only does FHS specify where the directories go, but it specifies important properties like which directories must be mounted read-only (critical for security).

Installing Solr

The instructions for Installing Solr are well written but suboptimal. The Solr documentation is fairly good, but it could still use improvement.

Solr requires a Java runtime environment (JRE) and does not need the entire JDK:

Shell
$ yes | sudo apt install default-jre

I downloaded the large zipped tar file called solr-8.8.0.tgz like this:

Shell
$ cd ~/Downloads/
/home/mslinn 

$ wget -qO solr-8.8.0.tgz \
  https://archive.apache.org/dist/lucene/solr/8.8.0/solr-8.8.0.tgz

The installation script is provided within the zipped tar file as solr-8.8.0/bin/install_solr_service.sh. When I looked at the source code for the installation script I felt that the official installation instructions were suboptimal. By default, the installer extracts the contents of the zipped tar file into /opt/solr/. There is no need to waste storage space unzipping the entire tar file just so the installation script can be run. Just that one file can be extracted into the current directory with this incantation:

Shell
$ tar xvf solr-8.8.0.tgz -C ./ --strip-components=2 \
  solr-8.8.0/bin/install_solr_service.sh

The installer’s help information is useful:

Shell
$ ./install_solr_service.sh
Usage: install_solr_service.sh <path_to_solr_distribution_archive> [OPTIONS]

  The first argument to the script must be a path to a Solr distribution archive, such as solr-5.0.0.tgz
    (only .tgz or .zip are supported formats for the archive)

  Supported OPTIONS include:

    -d     Directory for live / writable Solr files, such as logs, pid files, and index data; defaults to /var/solr

    -i     Directory to extract the Solr installation archive; defaults to /opt/
             The specified path must exist prior to using this script.

    -p     Port Solr should bind to; default is 8983

    -s     Service name; defaults to solr

    -u     User to own the Solr files and run the Solr process as; defaults to solr
             This script will create the specified user account if it does not exist.

    -f     Upgrade Solr. Overwrite symlink and init script of previous installation.

    -n     Do not start Solr service after install, and do not abort on missing Java

 NOTE: Must be run as the root user 

The Linux Filesystem Hierarchy Standard states that log file should be placed in /var/log. Apache Solr does not seem to allow that, at least according to the documentation I have seen so far. This deficiency means that production installations are more likely to have permissions-related security problems.

The actual installation using default arguments was painless:

Shell
$ sudo ./install_solr_service.sh solr-8.8.0.tgz
id: ‘solr’: no such user
Creating new user: solr
Adding system user `solr' (UID 132) ...
Adding new group `solr' (GID 139) ...
Adding new user `solr' (UID 132) with group `solr' ...
Creating home directory `/var/solr' ...

Extracting solr-8.8.0.tgz to /opt

Installing symlink /opt/solr -> /opt/solr-8.8.0 ...

Installing /etc/init.d/solr script ...

Installing /etc/default/solr.in.sh ...

Service solr installed.
Customize Solr startup configuration in /etc/default/solr.in.sh
Waiting up to 180 seconds to see Solr running on port 8983
Started Solr server on port 8983 (pid=9603). Happy searching! 

$ rm ./install_solr_service.sh

The installer does not update the PATH to point to the Solr installation directory, which is /opt/solr. I defined a bash alias to make controlling Solr more convenient:

Shell
$ echo "alias solr=/opt/solr/bin/solr" >> ~/.bash_aliases

$ source ~/.bash_aliases

$ solr status

Found 1 Solr nodes:

Solr process 2818 running on port 8983
{
  "solr_home":"/opt/solr/server/solr",
  "version":"8.8.0 b10659f0fc18b58b90929cfdadde94544d202c4a - noble - 2021-01-25 19:12:52",
  "startTime":"2021-02-12T12:22:42.968Z",
  "uptime":"0 days, 0 hours, 2 minutes, 49 seconds",
  "memory":"38.9 MB (%7.6) of 512 MB"}

Solr Home

The installation program created a new user called solr with home directory /var/solr. This location does not comply with the Linux Filesystem Hierarchy Standard. Log files are stored in /var/solr/logs.

Shell
$ ls /var/solr/logs/
solr-8983-console.log  solr.log  solr_gc.log  solr_slow_requests.log 

The running solr process ID is saved in /var/solr/solr-8983.pid.

I found the placement of the Solr configuration file in /var/solr/data/solr.xml to be bizarre. This ignores normal conventions for configuration file placement. Here is the file, with the copyright notice removed for the reader's convenience:

/var/solr/data/solr.xml
<?xml version="1.0" encoding="UTF-8" ?>
<!--
   This is an example of a simple "solr.xml" file for configuring one or
   more Solr Cores, as well as allowing Cores to be added, removed, and
   reloaded via HTTP requests.

   More information about options available in this configuration file,
   and Solr Core administration can be found online:
   https://lucene.apache.org/solr/guide/format-of-solr-xml.html
-->

<solr>

  <int name="maxBooleanClauses">${solr.max.booleanClauses:1024}</int>
  <str name="sharedLib">${solr.sharedLib:}</str>
  <str name="allowPaths">${solr.allowPaths:}</str>

  <solrcloud>

    <str name="host">${host:}</str>
    <int name="hostPort">${solr.port.advertise:0}</int>
    <str name="hostContext">${hostContext:solr}</str>

    <bool name="genericCoreNodeNames">${genericCoreNodeNames:true}</bool>

    <int name="zkClientTimeout">${zkClientTimeout:30000}</int>
    <int name="distribUpdateSoTimeout">${distribUpdateSoTimeout:600000}</int>
    <int name="distribUpdateConnTimeout">${distribUpdateConnTimeout:60000}</int>
    <str name="zkCredentialsProvider">${zkCredentialsProvider:org.apache.solr.common.cloud.DefaultZkCredentialsProvider}</str>
    <str name="zkACLProvider">${zkACLProvider:org.apache.solr.common.cloud.DefaultZkACLProvider}</str>

  </solrcloud>

  <shardHandlerFactory name="shardHandlerFactory"
    class="HttpShardHandlerFactory">
    <int name="socketTimeout">${socketTimeout:600000}</int>
    <int name="connTimeout">${connTimeout:60000}</int>
    <str name="shardsWhitelist">${solr.shardsWhitelist:}</str>
  </shardHandlerFactory>

  <metrics enabled="${metricsEnabled:true}"/>

</solr>

Solr Administrative Console

The Solr admin console (at http://localhost:8983/solr/) looks like this:

Solr README.txt

The Solr installation placed a file in /opt/solr/README.txt which had some useful information in it. This file can be viewed online. For example, there are a lot of options that can be specified when starting Solr:

Shell
$ solr start -help
Usage: solr start [-f] [-c] [-h hostname] [-p port] [-d directory] [-z zkHost] [-m memory] [-e example] [-s solr.solr.home] [-t solr.data.home] [-a "additional-options"] [-V]

  -f            Start Solr in foreground; default starts Solr in the background
                  and sends stdout / stderr to solr-PORT-console.log

  -c or -cloud  Start Solr in SolrCloud mode; if -z not supplied and ZK_HOST not defined in
                  solr.in.sh, an embedded ZooKeeper instance is started on Solr port+1000,
                  such as 9983 if Solr is bound to 8983

  -h      Specify the hostname for this Solr instance

  -p     Specify the port to start the Solr HTTP listener on; default is 8983
                  The specified port (SOLR_PORT) will also be used to determine the stop port
                  STOP_PORT=($SOLR_PORT-1000) and JMX RMI listen port RMI_PORT=($SOLR_PORT+10000).
                  For instance, if you set -p 8985, then the STOP_PORT=7985 and RMI_PORT=18985

  -d       Specify the Solr server directory; defaults to server

  -z    Zookeeper connection string; only used when running in SolrCloud mode using -c
                   If neither ZK_HOST is defined in solr.in.sh nor the -z parameter is specified,
                   an embedded ZooKeeper instance will be launched.

  -m    Sets the min (-Xms) and max (-Xmx) heap size for the JVM, such as: -m 4g
                  results in: -Xms4g -Xmx4g; by default, this script sets the heap size to 512m

  -s       Sets the solr.solr.home system property; Solr will create core directories under
                  this directory. This allows you to run multiple Solr instances on the same host
                  while reusing the same server directory set using the -d parameter. If set, the
                  specified directory should contain a solr.xml file, unless solr.xml exists in Zookeeper.
                  This parameter is ignored when running examples (-e), as the solr.solr.home depends
                  on which example is run. The default value is server/solr. If passed relative dir,
                  validation with current dir will be done, before trying default server/

  -t       Sets the solr.data.home system property, where Solr will store index data in /data subdirectories.
                  If not set, Solr uses solr.solr.home for config and data.

  -e   Name of the example to run; available examples:
      cloud:         SolrCloud example
      techproducts:  Comprehensive example illustrating many of Solr's core capabilities
      dih:           Data Import Handler
      schemaless:    Schema-less example

  -a            Additional parameters to pass to the JVM when starting Solr, such as to setup
                  Java debug options. For example, to enable a Java debugger to attach to the Solr JVM
                  you could pass: -a "-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=18983"
                  In most cases, you should wrap the additional parameters in double quotes.

  -j            Additional parameters to pass to Jetty when starting Solr.
                  For example, to add configuration folder that jetty should read
                  you could pass: -j "--include-jetty-dir=/etc/jetty/custom/server/"
                  In most cases, you should wrap the additional parameters in double quotes.

  -noprompt     Don't prompt for input; accept all defaults when running examples that accept user input

  -v and -q     Verbose (-v) or quiet (-q) logging. Sets default log level to DEBUG or WARN instead of INFO

  -V/-verbose   Verbose messages from this script 

Another useful bit of information was the command to shut down all Solr instances:

Shell
$ solr stop -all

Additional topics in this large README.txt file include:

  • Running in standalone (core) mode or multiprocessing mode (aka SolrCloud or collection mode)
  • Data Import Handler example
  • Schema-less example
  • Kitchen sink example that exercises all Solr features
  • Indexing Documents
  • An explanation of files included in an Apache Solr binary distribution

Conclusion

I do not recommend Apache Solr for most uses. This is due to the unnecessary security risks associated with requiring users to establish and verify secure file and directory permissions for Apache Solr’s failure to adhere to the Linux Filesystem Hierarchy Standard.