solr Merging Indexes

Solr是一个高性能,采用Java5开发,基于Lucene的全文搜索服务器。同时对其进行了扩展,提供了比Lucene更为丰富的查询语言,同时实现了可配置、可扩展并对查询性能进行了优化,并且提供了一个完善的功能管理界面,是一款非常优秀的全文搜索引擎。本文主要介绍solr的索引合并的相关情况

 

来源:https://cwiki.apache.org/confluence/display/solr/Merging+Indexes

if you need to combine indexes from two different projects or from multiple servers previously used in a distributed configuration, you can use either the IndexMergeTool included in lucene-misc or the CoreAdminHandler.

To merge indexes, they must meet these requirements:

  • The two indexes must be compatible: their schemas should include the same fields and they should analyze fields the same way.
  • The indexes must not include duplicate data.

Optimally, the two indexes should be built using the same schema.

Using IndexMergeTool

To merge the indexes, do the following:

  1. Find the lucene-core and lucene-misc JAR files that your version of Solr is using. You can do this by copying your solr.war file somewhere and unpacking it (jar xvf solr.war). These two JAR files should be in WEB-INF/lib. They are probably called something like lucene-core-VERSION.jar and lucene-misc-VERSION.jar.
  2. Copy them somewhere easy to find.
  3. Make sure that both indexes you want to merge are closed.
  4. Issue this command:
    java -cp /path/to/lucene-core-VERSION.jar:/path/to/lucene-misc-VERSION.jar
    org/apache/lucene/misc/IndexMergeTool
    /path/to/newindex
    /path/to/index1
    /path/to/index2

    This will create a new index at /path/to/newindex that contains both index1 and index2.

  5. Copy this new directory to the location of your application's solr index (move the old one aside first, of course) and start Solr.For example:
    java -cp /tmp/lucene-core-4.4.0.jar:
    /tmp/lucene-misc-4.4.0.jar org/apache/lucene/misc/IndexMergeTool
    ./newindex
    ./app1/solr/data/index
    ./app2/solr/data/index

Using CoreAdmin

This method uses the CoreAdminHandler with either the indexDir or srcCore parameters.

The indexDir parameter is used to define the path to the indexes for the cores that should be merged, and merge them into a 3rd core that must already exist prior to initiation of the merge process. The indexes must exist on the disk of the Solr host, which may make using this in a distributed environment cumbersome. With the indexDir parameter, a commit should be called on the cores to be merged (so the IndexWriter will close), and no writes should be allowed on either core until the merge is complete. If writes are allowed, corruption may occur on the merged index. Once complete, a commit should be called on the merged core to make sure the changes are visible to searchers.

The following example shows how to construct the merge command with indexDir:

http://localhost:8983/solr/admin/cores?action=mergeindexes&core=core0&indexDir=/home/solr/core1/data/index&indexDir=/home/solr/core2/data/index

In this example, core is the new core that is created prior to calling the merge process.

The srcCore parameter is used to call the cores to be merged by name instead of defining the path. The cores do not need to exist on the same disk as the Solr host, and the merged core does not need to exist prior to issuing the command. srcCore also protects against corruption during creation of the merged core index, so writes are still possible while the merge occurs. However, srcCore can only merge Solr Cores - indexes built directly with Lucene should be merged with either the IndexMergeTool or the indexDir parameter.

The following example shows how to construct the merge command with srcCore:

http://localhost:8983/solr/admin/cores?action=mergeindexes&core=core0&srcCore=core1&srcCore=core2

Combining Distribution and ReplicationLegacy Scaling and Distribution

发表评论