Mike Slinn
Mike Slinn

Suggestions for Django-Oscar and Django-Haystack

Published 2021-02-14. Last modified 2021-02-28.
Time to read: about 7 minutes.

This article is categorized under Django, Django-Oscar, PostgreSQL, Ubuntu, e-commerce.
Django is, and will continue to be, a database-agnostic web framework
– From the Django documentation

As I continue to check out the django-oscar e-commerce framework, the documentation strongly suggests that Apache Solr and django-haystack are important dependencies. Django-haystack is a software layer between django applications such as django-oscar and data stores, such as SQL databases and dedicated fulltext search engines such as Apache Solr. Django-haystack is quite flexible and could easily be made to work with a wide variety of datastores, in a variety of ways.

django-oscar sends all forms data through django-haystack

I looked at the django-oscar source code and found that its forms processor has a hard-coded dependency on django-haystack. Because django-oscar's forms are hardwired to django-haystack, implementing support for additional datastore backends is django-haystack's responsibility. I am not advocating that this be changed, I am just pointing this dependency out to help others understand how to work effectively with django-oscar and hence django-haystack.

E-Commerce Is Hot, Hot, Hot

US ecommerce grows 44% in 2020<br />Online spending was $861 billion: 21% of total retail sales
US ecommerce grows 44% in 2020
Online spending was $861 billion: 21% of total retail sales

In spite of my claim in this blog post that most e-commerce sites do not need fulltext search, there are a huge number of e-commerce sites, and that market is experiencing strong growth in part due to the COVID-19 epidemic.

The percentage of sites that do require fulltext search is high enough, and their data processing needs are substiantial enough, that enabling fulltext search for a platform as popular as django-oscar is likely to be worthwhile for the database vendors who have products that provide this feature natively.

Many/Most E-Commerce Sites Do Not Use Fulltext Search

Text-based search is not used on many e-commerce sites, including large sites such as Amazon and Walmart. Text-based search is used sparingly when it is used. Instead, many/most e-commerce sites favor GUI-based product filters, where users click on images or HTML form widgets to narrow product selections

Walmart visual search filters
Walmart visual search filters
Amazon visual search filters
Amazon visual search filters

A Brief History of Text Search

Searching for text has gone through several evolutionary stages. The following table does not describe other data storage technologies, such as key-value stores or graph data stores because they are generally not used for searching text.

When UsedTechnologyDescription
1960sDBMS and CODASYL databasesHierarchical data structures with transactions
1970s - presentRelational databasesSQL support; examples: PostgreSQL, MySQL, Oracle
1970s - presentFulltext databasesExamples: Solr, Elasticsearch, SAP HANA, PostgreSQL
2009 - presentNoSQL databasesNon-relational and distributed database; examples: MongoDB, Cassandra, Amazon DynamoDB, Bigtable

SQL Databases

According to DB-Engines, the top database market leaders are Oracle, MySQL, Microsoft SQL Server, PostgreSQL, MongoDB, and IBM Db2.

DB-Engines Database Trends
DB-Engines Database Trends

Databases with native fulltext search include Couchbase, MariaDB, MongoDB Atlas, MySQL, Oracle, PostgreSQL, SQL Server and Azure SQL Database.

SQL Search

Relational databases have used SQL (Structured Query Language) to declare searches since the late 1970s. In 1986, the ANSI and ISO standard groups officially adopted ISO 9075:1987 Information processing systems — Database language — SQL, The SQL standard has since been split into smaller standards, each of which has been separately revised many times.

The following SQL statement searches a database table called product for records that contain the word big in the field size:

select * from product where big like '%size%';

The database engine would do the search.

Django QuerySets employ SQL queries to search databases for matching records. QuerySets are lazily evaluated, which means their query results are only computed if needed. Meta-information about QuerySet results, such as counts, can be obtained without running the full query.

NoSQL Search

NoSQL fulltext search is not as common as fulltext search is for SQL databases. Perhaps this is due to the lack of standardization for NoSQL query languages.

Fulltext Search

Fulltext search can feature more nuanced queries based on a model of human language grammar. Fulltext search does this by performing language-aware token normalization prior to matching. However, proper identification of the user's language is required for fulltext search to work, while this is unnecessary for plain SQL queries.

Fulltext search would be useful if the user’s language is known and any of the following is true:

  • One or more files must be searched, instead of searching database records.
  • The pattern to search for is contextual.
  • A phrase must be normalized before searching.

Fulltext search does not perform language translation. The language that the user wrote their keywords or phrase in must match the language of the data store's contents. For example, a query containing Chinese keywords would not match against a data store that just contains French data.

Implementing fulltext search externally to a data store, as a separate process, can introduce problems synchronizing database contents and the external fulltext search engine. To avoid these problems, some people write to the database but read every query from the fulltext search interface. This extra software layer adds a penalty to every database read. The penalty is of reduced efficiency and therefore an increase in latency and compute cost.

It is most efficient to use a database that can perform fulltext search internally, instead of adding a dedicated fulltext search engine that runs as an external process

Django already has one official fulltext search module, for PostgreSQL. Fulltext search has been part of PostgreSQL since 2008, and support for phrase search was added in 2016. The article Full-Text Search in Django with PostgreSQL discusses implementing fulltext search using only Django and PostgreSQL, without resorting to external processes.

Django-haystack, and hence django-oscar, merely lack a little glue to enable fulltext search within databases that natively support that feature.

Update 2021-02-28: I just came across another Django framework (django-machina) that had been using django-haystack for search. They seriously considered a pull request for PostgreSQL fulltext search 3 months ago but did not merge it in the end. The issue remains unresolved for django-machina and django-oscar.

Sentiment Analysis

Sentiment analysis is independent of the storage technology. It uses natural language processing, text analysis, and computational linguistics to systematically identify, extract, quantify, and study affective states and subjective information.

Most e-commerce sites in February 2021 have no need for sentiment analysis, but that is likely to change in the coming years.

The Case Against Apache Solr with django-haystack

E-commerce websites must contend with traffic bursts. Sometimes the online store is idle, however, traffic can spike very sharply. Technical managers would not normally voluntarily add complex, demanding applications to a system that must perform well under heavy load unless they had a compelling reason.

Apache Solr is a complex, demanding external application. The installation process for Apache Solr is complex, and does not adhere to Linux directory standards, which introduces security risks. The actual details of the installation process is mostly undocumented manual work. This increases the likelihood of and insecure and improperly configured installation. If django-haystack had proper native database fulltext support, then there would be very little reason to consider Apache Solr for django-oscar installations. Removing Apache Solr from a django-oscar installation would make that e-commerce website run faster, be more reliable, more secure, and would scale out easier.

Overhead is generally lower when fewer processes are required, and overhead is minimized when fulltext search is performed natively in the database

Suggestions

In conclusion, I offer the following independent suggestions:

django-oscar Suggestions

  1. Reanimate or deprecate django-haystack. haystacksearch.org shows the most recent release of django-haystack was over 3 years ago, and the most recent release on readthedoc.io (2.8.1) was 2017-01-02. Did a major sponsor pull the plug on django-haystack at that time? If django-haystack is effectivly dead, then either it should be reanimated with new sponsorship or deprecated by django-oscar.
  2. Add a mention of fulltext search and how that relates to django-haystack to both django-oscar Docs web pages:
    1. Oscar Core Apps explained » Search
    2. Recipes » How to setup Solr with Oscar » Integrating with Haystack
  3. The Frobshop tutorial should introduce fulltext search and django-haystack, then recommend SimpleSearch. Because django-haystack is hard-coded into django-oscar form handling, and django-haystack configuration must be properly understood by django-oscar programmers, the Frobshop tutorial project should briefly introduce django-haystack, briefly mention fulltext search, and recommend that SimpleSearch be configured instead. There is no point in introducing advanced possibilities to newcomers when they are not demonstrably providing value to a tutorial project.
  4. The Frobshop tutorial should have all reference to Apache Solr removed. I do not believe that most e-commerce sites, which mostly feature GUI-driven product selection pages, have an unquenchable thirst for fulltext search. As I show above, most product filters use HTML select, multiple select, radio buttons, sliders, numeric ranges, etc. Unfortunately, the Frobshop “getting started” tutorial positions Apache Solr as a standard part of most django-oscar installations. Instead, fulltext search is likely only required for a minority of django-oscar installations.

django-haystack Suggestions

  1. Deprecate the term SimpleEngine in favor of SqlEngine. The django-haystack SimpleEngine is misnamed because it only works with SQL databases, and django-haystack would interface just fine with NoSQL databases if implementations were written. Those backends might also be simple (i.e. they would not necessarily support fulltext search), but they would also not use SQL.
  2. Add a mention of plain SQL to the 2nd paragraph of the first page of the Haystack website. I’ve highlighted the possible change, so that sentence would read: “Haystack is BSD licensed, plays nicely with third-party apps without needing to modify the source and supports plain SQL, Solr, Elasticsearch, Whoosh and Xapian.”
  3. Rewrite the SimpleEngine warning. Instead of discouraging the use of SimpleEngine, rewrite this information into a more balanced explanation of what using this plain SQL interface would mean in terms of user-visible functionality. As a newbie, when I read the django-haystack documentation, I got the strong impression that there is for some reason a burning need to use fulltext search all the time. This is simply not the case for many or even most applications. Given that web application frameworks like django-oscar have standardized on django-haystack for all HMTL form generation, this leads to misconceptions, especially for newcomers.
  4. Add fulltext support django-haystack for additional datastores, including Oracle, MySQL, Microsoft SQL Server, PostgreSQL and IBM Db2. This topic has been discussed many times online over several years. The consensus is that the best way to add support for additional datastores to django-haystack would be to clone the xapian-haystack project and modify it to work with the newly supported database. Microsoft may well find it within their interest to do the work for Microsoft SQL Server, Oracle may well do the work for Oracle DBMS and MySQL, and IBM may do the work for Db2. One of the PostgreSQL sponsors might take on the work for PostgreSQL.
  5. The django-haystack Google Group has been inactive for 3 years. New posts are not possible. Any mention of it should be removed.

Acknowledgement

Thanks to @acdha for adding GitHub Discussions to the django-haystack GitHub project in response to my request for an alternative to Google Groups.