Multilingual Search

Search Challenges

Searching with Solr

As with databases, we have to decide whether it's better to store all of your content in a single Solr index with language-specific fields or to use a separate cores for each language. Because Solr likes to have a single document field and to avoid needing to manage sets of per-language translated field names in queries, I generally recommend the latter approach, especially if your data is not synchronized across languages.

The Solr example schema lists reasonable defaults for most languages. You should plan to have a native speaker review your results once you have realistic test data available.

Using django-haystack

Haystack 1.x only supports a single Solr backend, which requires some work to use multiple cores. When version 2.0 is stable, this will mostly become a simple .using(lang) call.

  1. search_sites.py: load multiple backends, one per language
  2. search_indexes.py: configure get_queryset() to filter on language when indexing
  3. Change all views to retrieve the language-specific backend rather than simply calling SearchQuerySet()
  4. Create your own update_index and clear_index management commands which use the language-specific backends and filter database queries accordingly

/

#