一般来说,索引是系统地排列文档或(其他实体)。索引使用户能够在文档中快速地查找信息。
在Apache Solr中,我们可以索引(添加,删除,修改)各种文档格式,如xml,csv,pdf等。可以通过几种方式向Solr索引添加数据。
在本章中,将讨论创建索引的几个方法 -
在本章中,将讨论如何使用各种接口(命令行,Web界面和Java客户端API)向Apache Solr的索引添加数据,
Solr在其bin/
目录中有一个post
命令。使用这个命令,可以在Apache Solr
中索引各种格式的文件,例如JSON,XML,CSV。
进入到Apache Solr的bin
目录并执行post
命令的-h
选项,如以下代码块所示。
web3@ubuntu:/usr/local/solr-6.4.0/bin$ cd $SOLR_HOME web3@ubuntu:/usr/local/solr-6.4.0/bin$ ./post -h在执行上述命令时,将得到
post
命令的选项列表,如下所示。
Usage: post -c <collection> [OPTIONS] <files|directories|urls|-d [".."]> or post –help collection name defaults to DEFAULT_SOLR_COLLECTION if not specified OPTIONS ======= Solr options: -url <base Solr update URL> (overrides collection, host, and port) -host <host> (default: localhost) -p or -port <port> (default: 8983) -commit yes|no (default: yes) Web crawl options: -recursive <depth> (default: 1) -delay <seconds> (default: 10) Directory crawl options: -delay <seconds> (default: 0) stdin/args options: -type <content/type> (default: application/xml) Other options: -filetypes <type>[,<type>,...] (default: xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots, rtf,htm,html,txt,log) -params "<key> = <value>[&<key> = <value>...]" (values must be URL-encoded; these pass through to Solr update request) -out yes|no (default: no; yes outputs Solr response to console) -format Solr (sends application/json content as Solr commands to /update instead of /update/json/docs) Examples: * JSON file:./post -c wizbang events.json * XML files: ./post -c records article*.xml * CSV file: ./post -c signals LATEST-signals.csv * Directory of files: ./post -c myfiles ~/Documents * Web crawl: ./post -c gettingstarted http://lucene.apache.org/Solr -recursive 1 -delay 1 * Standard input (stdin): echo '{commit: {}}' | ./post -c my_collection - type application/json -out yes –d * Data as string: ./post -c signals -type text/csv -out yes -d $'id,value\n1,0.47'示例
假设有一个名称为sample.csv
的文件,其内容如下(这个文件也在`bin目录中)。
上述数据集包含个人详细信息,如学生ID,名字,姓氏,电话和城市。数据集的CSV文件如下所示。 在这里必须注意:数据记录的第一行。
id, first_name, last_name, phone_no, location 001, Pruthvi, Reddy, 9848022337, Hyderabad 002, kasyap, Sastry, 9848022338, Vishakapatnam 003, Rajesh, Khanna, 9848022339, Delhi 004, Preethi, Agarwal, 9848022330, Pune 005, Trupthi, Mohanty, 9848022336, Bhubaneshwar 006, Archana, Mishra, 9848022335, Chennai可以使用
post
命令在名称为Solr_sample的核心下,对此数据编制索引,如下所示:
web3@ubuntu:/usr/local/solr-6.4.0/bin$ ./post -c solr_sample sample.csv在执行上述命令时,给定文档在指定的核心下会生成索引,生成以下输出。
web3@ubuntu:/usr/local/solr-6.4.0/bin$ ./post -c solr_sample sample.csv /usr/local/jdk1.8.0_65/bin/java -classpath /usr/local/solr-6.4.0/dist/solr-core-6.4.0.jar -Dauto=yes -Dc=solr_sample -Ddata=files org.apache.solr.util.SimplePostTool sample.csv SimplePostTool version 5.0.0 Posting files to [base] url http://localhost:8983/solr/solr_sample/update... Entering auto mode. File endings considered are xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log POSTing file sample.csv (text/csv) to [base] 1 files indexed. COMMITting Solr index changes to http://localhost:8983/solr/solr_sample/update... Time spent: 0:00:00.663
[ { "id" : "001", "name" : "Ram", "age" : 53, "Designation" : "Manager", "Location" : "Hyderabad", }, { "id" : "002", "name" : "Robert", "age" : 43, "Designation" : "SR.Programmer", "Location" : "Chennai", }, { "id" : "003", "name" : "Rahim", "age" : 25, "Designation" : "JR.Programmer", "Location" : "Delhi", } ]使用Java Client API添加文档
以下是Java程序向Apache Solr索引添加文档代码。将代码保存在AddingDocument.java
文件中。
import java.io.IOException; import org.apache.Solr.client.Solrj.SolrClient; import org.apache.Solr.client.Solrj.SolrServerException; import org.apache.Solr.client.Solrj.impl.HttpSolrClient; import org.apache.Solr.common.SolrInputDocument; public class AddingDocument { public static void main(String args[]) throws Exception { //Preparing the Solr client String urlString = "http://localhost:8983/Solr/my_core"; SolrClient Solr = new HttpSolrClient.Builder(urlString).build(); //Preparing the Solr document SolrInputDocument doc = new SolrInputDocument(); //Adding fields to the document doc.addField("id", "003"); doc.addField("name", "Rajaman"); doc.addField("age","34"); doc.addField("addr","vishakapatnam"); //Adding the document to Solr Solr.add(doc); //Saving the changes Solr.commit(); System.out.println("Documents added"); } }通过在终端中执行以下命令编译上述代码
web3@ubuntu:/usr/local/solr-6.4.0/bin$ javac AddingDocument.java web3@ubuntu:/usr/local/solr-6.4.0/bin$ java AddingDocument执行上述命令后,将得到以下输出。
Documents added