Major Changes in Solr 9
Solr 9.0 is a major new release of Solr.
This page highlights the most important changes including new features and changes in default behavior as well as previously deprecated features that have now been removed.
Solr 9 Upgrade Planning
Before starting an upgrade to Solr 9, please take the time to review all information about changes from the version you are currently on up to Solr 9.
You should also consider all changes that have been made to Solr in any version you have not upgraded to already. For example, if you are currently using Solr 8.1, you should review changes made in all subsequent 8.x releases in addition to changes for 9.0.
A thorough review of the list in Major Changes in Earlier 8.x Versions as well as the CHANGES.txt in your Solr instance will help you plan your migration to Solr 9.
Upgrade Prerequisites
Solr 9 requires Java 11 as minimum Java version and is also tested with Java 17.
If using CloudSolrClient
to connect to your SolrCloud cluster, SolrJ must be upgraded in all your client applications to version 8.10 or higher (8.x), before upgrading your SolrCloud cluster to version 9.0. Otherwise, SolrJ will not be able to connect to the cluster once it has upgraded to Solr 9.
If you have an old collection that was initially created with a Solr version prior to 5.0, Solr may keep that collection’s cluster state in the /clusterstate.json
file at the root of Zookeeper. Solr 9 no longer supports this file. You must upgrade such collections to the new per-collection state.json
format before upgrading to Solr 9. This is done by calling the Collection API MIGRATESTATEFORMAT action while still using Solr 8.x.
If you are using Solr in standalone mode with the Query Elevation Component with its elevation file in the data directory, you’ll have to move it to the configset folder instead.
If you rely on metrics, alerts, or monitors on Solr KPIs that use the "master" or "slave" terminology, please update your system for those metrics to now show up with "leader" and "follower" terminology.
Rolling Upgrades
If you are planning to upgrade your cluster using a rolling upgrade model (upgrade each node in succession, as opposed to standing up a brand new 9.x cluster), please read the following carefully.
Rolling upgrades from Solr 8 to Solr 9 require upgrading from Solr 8.7 or newer. If you run an 8.x. version prior to 8.7 we recommend that you first upgrade to 8.11.
PKI Authentication
Internode communication secured by PKI Authentication has changed formats. For detailed information, see PKI Authentication Plugin.
A rolling upgrade from Solr 8 to Solr 9 requires the following multiple restart sequence:
-
Upgrade to Solr 9 and set system properties:
solr.pki.sendVersion=v1
andsolr.pki.acceptVersions=v1,v2
. This will allow Solr 9 nodes to send messages to Solr 8 nodes while the upgrade is in progress. -
Restart with
solr.pki.sendVersion=v2
(default, can be unset) andsolr.pki.acceptVersions=v1,v2
. This will force all nodes to send the new header. -
(Optional) Restart with system property
solr.pki.acceptVersions=v2
(default, can be unset) to prevent outdated nodes from connecting to your cluster.
Reindexing After Upgrade
It is always strongly recommended that you fully reindex your documents after a major version upgrade. For details, see the Reindexing section, which covers several strategies for how to reindex.
In Solr 8, it was possible to add docValues to a schema without re-indexing via UninvertDocValuesMergePolicy
, an advanced/expert utility. Due to changes in Lucene 9, that isn’t possible any more.
Solr 9.1
Querying and Indexing
-
Added Lucene91HnswVectorsFormat codec for DenseVectorField. In order to use the new codec, reindex is necessary.
SolrJ
SolrJ is beginning to be split up. If you use ZooKeeper coordinates to create a CloudSolrClient
, you will need to add a dependency on solrj-zookeeper
. If you use SolrJ’s Maven POM to depend on SolrJ, then this should happen automatically through transitive resolution. Instead of depending on ZooKeeper, consider migrating to use of specifying a list of Solr URLs in the client’s builder. Not only does this reduce dependencies, but it improves security by being able to limit ZooKeeper access.
Solr 9.0
Querying and Indexing
-
Dense Vector "Neural" Search through
DenseVectorField
fieldType and K-Nearest-Neighbor (KNN) Query Parser. -
Admin UI support for SQL Querying.
-
New snowball stemmers: Hindi, Indonesian, Nepali, Serbian, Tamil, and Yiddish.
-
New NorwegianNormalizationFilter
-
Implicit
/terms
handler now returns terms across all shards in SolrCloud instead of only the local core. Users/apps may be assuming the old behavior. A request can be modified via the standarddistrib=false
param to only use the local core receiving the request. -
SQL support has been moved to the sql module. Existing Solr configurations do not need any SQL related changes, however the module needs to be installed - see the section SQL Query Language.
-
JSON aggregations uses corrected sample formula to compute standard deviation and variance. The computation of stdDev and variance in JSON aggregation is same as StatsComponent.
-
Facet count in Json Facet module always returns a
long
value, irrespective of number of shards. -
MacroExpander
will no longer will expand URL parameters inside of theexpr
parameter (used by streaming expressions). Additionally, users are advised to use theInjectionDefense
class when constructing streaming expressions that include user supplied data to avoid risks similar to SQL injection. The legacy behavior of expanding theexpr
parameter can be reinstated with-DStreamingExpressionMacros=true
passed to the JVM at startup -
The response format for field values serialized as raw XML (via the
[xml]
raw value DocTransformer andwt=xml
) has changed. Previously, values were dropped in directly as top-level child elements of each<doc>
, obscuring associated field names and yielding inconsistent<doc>
structure. As of version 9.0, raw values are wrapped in a<raw name="field_name">[…]</raw>
element at the top level of each<doc>
(or within an enclosing<arr name="field_name"><raw>[…]</raw></arr>
element for multi-valued fields). Existing clients that parse field values serialized in this way will need to be updated accordingly. -
Highlighting:
hl.method=unified
is the new default. Usehl.method=original
to switch back if needed. -
solr.xml
maxBooleanClauses
is now enforced recursively. Users who upgrade from prior versions of Solr may find that some requests involving complex internal query structures (Example: long query strings usingedismax
with manyqf
andpf
fields that include query time synonym expansion) which worked in the past now hit this limit and fail. Users in this situation are advised to consider the complexity of their queries/configuration, and increase the value ofmaxBooleanClauses
if warranted. -
Atomic/partial updates to nested documents now require the
_root_
field to clearly show the document isn’t a root document. Solr 8 would fallback on the_route_
param but no longer.
Security
-
New Certificate Authentication Plugin, enabling end-to-end use of x509 client certificates for Authentication and Authorization.
-
Improved security when using PKI Authentication plugin.
-
Upgrade to Zookeeper 3.7, allowing for TLS protected ZK communication.
-
All request handlers support security permissions. Users may have to adapt their
security.json
. -
Ability to disable admin UI through a system property.
-
The property
blockUnknown
in theBasicAuthPlugin
and theJWTAuthPlugin
now defaults totrue
instead offalse
. This change is backward incompatible. If you need the pre-9.0 default behavior, you need to explicitly setblockUnknown:false
insecurity.json
. -
Solr now runs with the Java security manager enabled by default. Hadoop users may need to disable this.
-
Solr now binds to localhost network interface by default for better out of the box security. Administrators that need Solr exposed more broadly can change the
SOLR_JETTY_HOST
property in their Solr include (solr.in.sh
/solr.in.cmd
) file. -
Solr embedded zookeeper only binds to localhost by default. This embedded zookeeper should not be used in production. If you rely upon the previous behavior, then you can change the
clientPortAddress
insolr/server/solr/zoo.cfg
-
Jetty low level request-logging in NCSA format is now enabled by default, with a retention of 3 days worth of logs. This may require some more disk space for logs than was the case in version 8.x. See Configuring Logging for how to change this.
-
Hadoop authentication support has been moved to the new
hadoop-auth
module. Existing Solr configurations do not need any Hadoop authentication related changes, however the module needs to be installed - see the section Hadoop Authentication Plugin. -
JWTAuthPlugin has been moved to a module. Users need to add the module
jwt-auth
to classpath. The plugin has also changed package name toorg.apache.solr.security.jwt
, but can still be loaded as shortformclass="solr.JWTAuthPlugin"
. -
Dependency updates - A lot of dependency updates removes several security issues from dependencies, and thus make Solr more secure.
-
The allow-list defining allowed URLs for the
shards
parameter is not in theshardHandler
configuration anymore. It is defined by theallowUrls
top-level property of thesolr.xml
file. For more information, see Format of solr.allowUrls documentation. -
To improve security,
StatelessScriptUpdateProcessorFactory
has been renamed asScriptUpdateProcessorFactory
and moved to thescripting
Module instead of shipping as part of Solr core. This module needs to be enabled explicitly. -
To improve security,
XSLTResponseWriter
has been moved to thescripting
Module instead of shipping as part of Solr core. This module needs to be enabled explicitly.
Stability and Scalability
-
Rate limiting provides a way to throttle update and search requests based on usage metrics.
-
A new Task management interface allows declaring tasks as cancellable and trackable.
-
Ability to specify node roles in Solr. This release supports
overseer
anddata
roles out of the box. -
New API for pluggable Replica Placement Plugins that replaces the auto-scaling framework.
-
Support for distributed processing of cluster state updates and collection API calls, without relying on the Overseer.
Build and Docker
-
Solr is now built and released independently of Lucene (separate Apache projects).
-
Build system switched to Gradle, no longer uses Ant + Ivy.
-
Docker image creation is now a part of the Apache Solr GitHub repo.
-
Docker image documentation is now a part of the reference guide.
-
Official Docker image upgraded to use JDK17 (by Eclipse Temurin) and ability to create functionally identical local image.
Logging and Metrics
-
Metrics handler only depends on SolrJ instead of core and has its own
log4j2.xml
and no longer shares Solr’s logging config. -
Only
SearchHandler
and subclasses have "local" metrics now. It’s now tracked as if it’s another handler with a "[shard]" suffix, e.g. "/select[shard]". There are no longer ".distrib." named metrics; all metrics are assumed to be such except "[shard]". The default Prometheus exporter config splits that component to a new label named "internal". The sample Grafana dashboard now filters to include or exclude this. -
The default port of "Prometheus exporter" has changed from 9983 to 8989, so you may need to adjust your configuration after upgrade.
-
Logging is now asynchronous by default. There’s a small window where log messages may be lost in the event of some hard crash. Switch back to synchronous logging if this is unacceptable, see comments in the log4j2 configuration files (log4j2.xml by default).
-
Log4J configuration & Solr MDC values - MDC values that Solr sets for use by Logging calls (such as the collection name, shard name, replica name, etc…) have been modified to now be "bare" values, without the special single character prefixes that were included in past version. The default
log4j2.xml
configuration file for Solr has been modified to prepend these same prefixes to MDC values when included in Log messages as part of the<PatternLayout/>
. Users who have custom logging configurations that wish to ensure Solr 9.x logs are consistently formatted after upgrading will need to make similar changes to their logging configuration files. See SOLR-15630 for more details. -
Jetty Request log is now enabled by default, i.e. logging every request.
-
The prometheus-exporter is no longer packaged as a Solr Module. It can be found under
solr/prometheus-exporter/
. -
Solr modules (formerly known as contribs) can now easily be enabled by an environment variable (e.g. in
solr.in.sh
orsolr.in.cmd
) or as a system property (e.g. inSOLR_OPTS
). Example:SOLR_MODULES=extraction,ltr
.
Deprecations and Removals
-
The Data Import Handler (DIH) is an independent project now; it is no longer a part of Solr.
-
No more support for
clusterstate.json
andMIGRATESTATE
API has been removed. If your collections useclusterstate.json
you will need to take some steps, described elsewhere in this document. -
Auto-scaling framework has been removed. Please refer to Replica Placement Plugins for alternate options.
-
LegacyBM25SimilarityFactory
has been removed. -
Legacy SolrCache implementations (LRUCache, LFUCache, FastLRUCache) have been removed. Users have to modify their existing configurations to use CaffeineCache instead.
-
VelocityResponseWriter
is an independent project now; it is no longer a part of Solr. This encompasses all previously included/browse
andwt=velocity
examples. -
Cross Data Center Replication has been removed.
-
SolrJ clients like
HttpSolrClient
andLBHttpSolrClient
that lacked HTTP2 support have been deprecated. The old CloudSolrClient has been renamed as CloudLegacySolrClient and deprecated. -
SimpleFSDirectoryFactory is removed in favor of NIOFSDirectoryFactory
-
Removed the deprecated
HttpSolrClient.RemoteSolrException
andHttpSolrClient.RemoteExecutionException
. All the usages are replaced byBaseHttpSolrClient.RemoteSolrException
andBaseHttpSolrClient.RemoteExecutionException
. -
maxShardsPerNode
parameter has been removed because it was broken and inconsistent with other replica placement strategies. Other relevant placement strategies should be used instead, such as autoscaling policy or rules-based placement. -
The binary distribution no longer contains test-framework jars.
-
Deprecated BlockJoinFacetComponent and BlockJoinDocSetFacetComponent are removed. Users are encouraged to migrate to uniqueBlock() in JSON Facet API.
-
Core level admin API endpoints
/admin/threads
,/admin/properties
,/admin/logging
are now only available at the node level.
Other
-
Contrib modules are now just "modules". You can easily enable module(s) through environment variable
SOLR_MODULES
. -
Features lifted out as separate modules are: HDFS, Hadoop-Auth, SQL, Scripting, and JWT-Auth.
-
The "dist" folder in the release has been removed. Please update your
<lib>
entries in yoursolrconfig.xml
to use the new location.
-
The
solr-core
andsolr-solrj
jars can be found underserver/solr-webapp/webapp/WEB-INF/lib/
. -
The Solr Module jars and their dependencies can be found in
modules/<module-name>/lib
, packaged individually for each module. -
The
solrj-deps
(SolrJ Dependencies) are no longer separated out from the other Server jars. -
Please refer to the SolrJ Maven artifact to see the exact dependencies you need to include from
server/solr-webapp/webapp/WEB-INF/lib/
andserver/lib/ext/
if you are loading in SolrJ manually. If you plan on using SolrJ as a JDBC driver, please refer to the JDBC documentation -
More information can be found in the Libs documentation.
SolrJ class CloudSolrClient
now supports HTTP2. It has a new Builder. See CloudLegacySolrClient
for the 8.x version of this class.
In Backup request responses, the response
key now uses a map to return information instead of a list. This is only applicable for users returning information in JSON format, which is the default behavior.
SolrMetricProducer
/ SolrInfoBean
APIs have changed and third-party components that implement these APIs need to be updated.
Use of blacklist/whitelist terminology has been completely removed. JWTAuthPlugin parameter algWhitelist
is now algAllowlist
. The old parameter will still work in 9.x. Environment variables SOLR_IP_WHITELIST
and SOLR_IP_BLACKLIST
are no longer supported, but replaced with SOLR_IP_ALLOWLIST
and SOLR_IP_DENYLIST
.
Solr Backups - Async responses for backups now correctly aggregate and return information. For collection’s snapshot backup request responses additional fields indexVersion
, indexFileCount
, etc. were added similar to incremental backup request responses.
If you are using the HDFS backup repository, you need to change the repository class to org.apache.solr.hdfs.backup.repository.HdfsBackupRepository
- see the HDFS Backup Repository section.
HDFS storage support has been moved to a module. Existing Solr configurations do not need any HDFS-related changes, however the module needs to be installed - see the section Solr on HDFS.
The folder $SOLR_HOME/userfiles
, used by the "cat" streaming expression, is no longer created automatically on startup. The user must create this folder.
Solr no longer requires a solr.xml
in $SOLR_HOME
. If one is not found, Solr will instead use the default one from $SOLR_TIP/server/solr/solr.xml
. You can revert to the pre-9.0 behaviour by setting environment variable SOLR_SOLRXML_REQUIRED=true
or system property -Dsolr.solrxml.required=true
. Solr also does not require a zoo.cfg
in $SOLR_HOME
if started with embedded zookeeper.
base_url
has been removed from stored cluster state. If you’re able to upgrade SolrJ to 8.8.x for all of your client applications, then you can set -Dsolr.storeBaseUrl=false
(introduced in Solr 8.8.1) to better align the stored state in Zookeeper with future versions of Solr; as of Solr 9.x, the base_url
will no longer be persisted in stored state. However, if you are not able to upgrade SolrJ to 8.8.x for all client applications, then you should set -Dsolr.storeBaseUrl=true
so that Solr will continue to store the base_url
in Zookeeper. For background, see: SOLR-12182 and SOLR-15145. Support for the solr.storeBaseUrl
system property will be removed in Solr 10.x and base_url
will no longer be stored.
Analyzer components can now be looked up by their SPI names based on the field type configuration.
The solr-extraction
module has been cleaned up to produce solr-extraction-
jar instead of solr-cell-
jars.
Extra lucene libraries used in modules are no longer packaged in lucene-libs/
under module directories in the binary release. Instead, these libraries will be included with all other module dependencies in lib/
.