Published 2021-02-14. Last modified 2021-02-28.
Time to read: about 12 minutes.
– From the Django documentation
As I continue to check out the
django-oscar e-commerce framework, the documentation strongly suggests that Apache Solr and
django-haystack are important dependencies.
Django-haystack is a software layer between
django applications such as
django-oscar and data stores, such as SQL databases and dedicated fulltext search engines such as Apache Solr.
Django-haystack is quite flexible and could easily be made to work with a wide variety of datastores, in a variety of ways.
django-oscar sends all forms data through django-haystack
I looked at the
django-oscar source code and found that its forms processor has a hard-coded dependency on
django-oscar's forms are hardwired to
django-haystack, implementing support for additional datastore backends is
django-haystack's responsibility. I am not advocating that this be changed, I am just pointing this dependency out to help others understand how to work effectively with
django-oscar and hence
E-Commerce Is Hot, Hot, Hot
In spite of my claim in this blog post that most e-commerce sites do not need fulltext search, there are a huge number of e-commerce sites, and that market is experiencing strong growth in part due to the COVID-19 epidemic.
The percentage of sites that do require fulltext search is high enough, and their data processing needs are substiantial enough, that enabling fulltext search for a platform as popular as
django-oscar is likely to be worthwhile for the database vendors who have products that provide this feature natively.
Many/Most E-Commerce Sites Do Not Use Fulltext Search
Text-based search is not used on many e-commerce sites, including large sites such as Amazon and Walmart. Text-based search is used sparingly when it is used. Instead, many/most e-commerce sites favor GUI-based product filters, where users click on images or HTML form widgets to narrow product selections
A Brief History of Text Search
Searching for text has gone through several evolutionary stages. The following table does not describe other data storage technologies, such as key-value stores or graph data stores because they are generally not used for searching text.
|1960s||DBMS and CODASYL databases||Hierarchical data structures with transactions|
|1970s - present||Relational databases||SQL support; examples: PostgreSQL, MySQL, Oracle|
|1970s - present||Fulltext databases||Examples: Solr, Elasticsearch, SAP HANA, PostgreSQL|
|2009 - present||NoSQL databases||Non-relational and distributed database; examples: MongoDB, Cassandra, Amazon DynamoDB, Bigtable|
According to DB-Engines, the top database market leaders are Oracle, MySQL, Microsoft SQL Server, PostgreSQL, MongoDB, and IBM Db2.
Relational databases have used SQL (Structured Query Language) to declare searches since the late 1970s. In 1986, the ANSI and ISO standard groups officially adopted ISO 9075:1987 Information processing systems — Database language — SQL, The SQL standard has since been split into smaller standards, each of which has been separately revised many times.
The following SQL statement searches a database table called
product for records that contain the word
big in the field
select * from product where big like '%size%';
The database engine would do the search.
QuerySets employ SQL queries to search databases for matching records.
QuerySets are lazily evaluated, which means their query results are only computed if needed. Meta-information about
QuerySet results, such as counts, can be obtained without running the full query.
NoSQL fulltext search is not as common as fulltext search is for SQL databases. Perhaps this is due to the lack of standardization for NoSQL query languages.
Fulltext search can feature more nuanced queries based on a model of human language grammar. Fulltext search does this by performing language-aware token normalization prior to matching. However, proper identification of the user's language is required for fulltext search to work, while this is unnecessary for plain SQL queries.
Fulltext search would be useful if the user’s language is known and any of the following is true:
- One or more files must be searched, instead of searching database records.
- The pattern to search for is contextual.
- A phrase must be normalized before searching.
Fulltext search does not perform language translation. The language that the user wrote their keywords or phrase in must match the language of the data store's contents. For example, a query containing Chinese keywords would not match against a data store that just contains French data.
Implementing fulltext search externally to a data store, as a separate process, can introduce problems synchronizing database contents and the external fulltext search engine. To avoid these problems, some people write to the database but read every query from the fulltext search interface. This extra software layer adds a penalty to every database read. The penalty is of reduced efficiency and therefore an increase in latency and compute cost.
Django already has one official fulltext search module, for PostgreSQL. Fulltext search has been part of PostgreSQL since 2008, and support for phrase search was added in 2016. The article Full-Text Search in Django with PostgreSQL discusses implementing fulltext search using only Django and PostgreSQL, without resorting to external processes.
Django-haystack, and hence
django-oscar, merely lack a little glue to enable fulltext search within databases that natively support that feature.
Update 2021-02-28: I just came across another Django framework (
django-machina) that had been using
django-haystack for search. They seriously considered a pull request for PostgreSQL fulltext search 3 months ago but did not merge it in the end. The issue remains unresolved for
Sentiment analysis is independent of the storage technology. It uses natural language processing, text analysis, and computational linguistics to systematically identify, extract, quantify, and study affective states and subjective information.
Most e-commerce sites in February 2021 have no need for sentiment analysis, but that is likely to change in the coming years.
The Case Against Apache Solr with django-haystack
E-commerce websites must contend with traffic bursts. Sometimes the online store is idle, however, traffic can spike very sharply. Technical managers would not normally voluntarily add complex, demanding applications to a system that must perform well under heavy load unless they had a compelling reason.
Apache Solr is a complex, demanding external application. The installation process for Apache Solr is complex, and does not adhere to Linux directory standards, which introduces security risks. The actual details of the installation process is mostly undocumented manual work. This increases the likelihood of and insecure and improperly configured installation. If
django-haystack had proper native database fulltext support, then there would be very little reason to consider Apache Solr for
django-oscar installations. Removing Apache Solr from a
django-oscar installation would make that e-commerce website run faster, be more reliable, more secure, and would scale out easier.
In conclusion, I offer the following independent suggestions:
- Reanimate or deprecate
haystacksearch.orgshows the most recent release of
django-haystackwas over 3 years ago, and the most recent release on readthedoc.io (2.8.1) was 2017-01-02. Did a major sponsor pull the plug on
django-haystackat that time? If
django-haystackis effectivly dead, then either it should be reanimated with new sponsorship or deprecated by
- Add a mention of fulltext search and how that relates to
django-oscarDocs web pages:
- The Frobshop tutorial should introduce fulltext search and
django-haystack, then recommend
django-haystackis hard-coded into
django-oscarform handling, and
django-haystackconfiguration must be properly understood by
django-oscarprogrammers, the Frobshop tutorial project should briefly introduce
django-haystack, briefly mention fulltext search, and recommend that
SimpleSearchbe configured instead. There is no point in introducing advanced possibilities to newcomers when they are not demonstrably providing value to a tutorial project.
- The Frobshop tutorial should have all reference to Apache Solr removed. I do not believe that most e-commerce sites, which mostly feature GUI-driven product selection pages, have an unquenchable thirst for fulltext search. As I show above, most product filters use HTML
select, multiple select, radio buttons, sliders, numeric ranges, etc. Unfortunately, the Frobshop “getting started” tutorial positions Apache Solr as a standard part of most
django-oscarinstallations. Instead, fulltext search is likely only required for a minority of
- Deprecate the term
SimpleEnginein favor of
SimpleEngineis misnamed because it only works with SQL databases, and
django-haystackwould interface just fine with NoSQL databases if implementations were written. Those backends might also be simple (i.e. they would not necessarily support fulltext search), but they would also not use SQL.
- Add a mention of plain SQL to the 2nd paragraph of the first page of the Haystack website. I’ve highlighted the possible change, so that sentence would read: “Haystack is BSD licensed, plays nicely with third-party apps without needing to modify the source and supports plain SQL, Solr, Elasticsearch, Whoosh and Xapian.”
- Rewrite the
SimpleEnginewarning. Instead of discouraging the use of
django-haystackdocumentation, I got the strong impression that there is for some reason a burning need to use fulltext search all the time. This is simply not the case for many or even most applications. Given that web application frameworks like
django-oscarhave standardized on
django-haystackfor all HMTL form generation, this leads to misconceptions, especially for newcomers.
- Add fulltext support
django-haystackfor additional datastores, including Oracle, MySQL, Microsoft SQL Server, PostgreSQL and IBM Db2. This topic has been discussed many times online over several years. The consensus is that the best way to add support for additional datastores to
django-haystackwould be to clone the
xapian-haystackproject and modify it to work with the newly supported database. Microsoft may well find it within their interest to do the work for Microsoft SQL Server, Oracle may well do the work for Oracle DBMS and MySQL, and IBM may do the work for Db2. One of the PostgreSQL sponsors might take on the work for PostgreSQL.
django-haystackGoogle Group has been inactive for 3 years. New posts are not possible. Any mention of it should be removed.