一般来说,索引是系统地排列文档或(其他实体)。索引使用户能够在文档中快速地查找信息。
在Apache Solr中,我们可以索引(添加,删除,修改)各种文档格式,如xml,csv,pdf等。可以通过几种方式向Solr索引添加数据。
在本章中,将讨论创建索引的几个方法 -
在本章中,将讨论如何使用各种接口(命令行,Web界面和Java客户端API)向Apache Solr的索引添加数据,
Solr在其bin/
目录中有一个post
命令。使用这个命令,可以在Apache Solr
中索引各种格式的文件,例如JSON,XML,CSV。
进入到Apache Solr的bin
目录并执行post
命令的-h
选项,如以下代码块所示。
web3@ubuntu:/usr/local/solr-6.4.0/bin$ cd $SOLR_HOME |
web3@ubuntu:/usr/local/solr-6.4.0/bin$ ./post -h |
post
命令的选项列表,如下所示。
Usage: post -c <collection> [OPTIONS] <files|directories|urls|-d [".."]> |
or post –help |
collection name defaults to DEFAULT_SOLR_COLLECTION if not specified |
OPTIONS |
======= |
Solr options: |
-url <base Solr update URL> (overrides collection, host, and port) |
-host <host> (default: localhost) |
-p or -port <port> (default: 8983) |
-commit yes|no (default: yes) |
Web crawl options: |
-recursive <depth> (default: 1) |
-delay <seconds> (default: 10) |
Directory crawl options: |
-delay <seconds> (default: 0) |
stdin/args options: |
-type <content/type> (default: application/xml) |
Other options: |
-filetypes <type>[,<type>,...] (default: |
xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots, |
rtf,htm,html,txt,log) |
-params "<key> = <value>[&<key> = <value>...]" (values must be |
URL-encoded; these pass through to Solr update request) |
-out yes|no (default: no; yes outputs Solr response to console) |
-format Solr (sends application/json content as Solr commands |
to /update instead of /update/json/docs) |
Examples: |
* JSON file:./post -c wizbang events.json |
* XML files: ./post -c records article*.xml |
* CSV file: ./post -c signals LATEST-signals.csv |
* Directory of files: ./post -c myfiles ~/Documents |
* Web crawl: ./post -c gettingstarted http://lucene.apache.org/Solr -recursive 1 -delay 1 |
* Standard input (stdin): echo '{commit: {}}' | ./post -c my_collection - |
type application/json -out yes –d |
* Data as string: ./post -c signals -type text/csv -out yes -d $'id,value\n1,0.47' |
假设有一个名称为sample.csv
的文件,其内容如下(这个文件也在`bin目录中)。
上述数据集包含个人详细信息,如学生ID,名字,姓氏,电话和城市。数据集的CSV文件如下所示。 在这里必须注意:数据记录的第一行。
id, first_name, last_name, phone_no, location |
001, Pruthvi, Reddy, 9848022337, Hyderabad |
002, kasyap, Sastry, 9848022338, Vishakapatnam |
003, Rajesh, Khanna, 9848022339, Delhi |
004, Preethi, Agarwal, 9848022330, Pune |
005, Trupthi, Mohanty, 9848022336, Bhubaneshwar |
006, Archana, Mishra, 9848022335, Chennai |
post
命令在名称为Solr_sample的核心下,对此数据编制索引,如下所示:
web3@ubuntu:/usr/local/solr-6.4.0/bin$ ./post -c solr_sample sample.csv |
web3@ubuntu:/usr/local/solr-6.4.0/bin$ ./post -c solr_sample sample.csv |
/usr/local/jdk1.8.0_65/bin/java -classpath /usr/local/solr-6.4.0/dist/solr-core-6.4.0.jar -Dauto=yes -Dc=solr_sample -Ddata=files org.apache.solr.util.SimplePostTool sample.csv |
SimplePostTool version 5.0.0 |
Posting files to [base] url http://localhost:8983/solr/solr_sample/update... |
Entering auto mode. File endings considered are xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log |
POSTing file sample.csv (text/csv) to [base] |
1 files indexed. |
COMMITting Solr index changes to http://localhost:8983/solr/solr_sample/update... |
Time spent: 0:00:00.663 |
[ |
{ |
"id" : "001", |
"name" : "Ram", |
"age" : 53, |
"Designation" : "Manager", |
"Location" : "Hyderabad", |
}, |
{ |
"id" : "002", |
"name" : "Robert", |
"age" : 43, |
"Designation" : "SR.Programmer", |
"Location" : "Chennai", |
}, |
{ |
"id" : "003", |
"name" : "Rahim", |
"age" : 25, |
"Designation" : "JR.Programmer", |
"Location" : "Delhi", |
} |
] |
以下是Java程序向Apache Solr索引添加文档代码。将代码保存在AddingDocument.java
文件中。
import java.io.IOException; |
import org.apache.Solr.client.Solrj.SolrClient; |
import org.apache.Solr.client.Solrj.SolrServerException; |
import org.apache.Solr.client.Solrj.impl.HttpSolrClient; |
import org.apache.Solr.common.SolrInputDocument; |
public class AddingDocument { |
public static void main(String args[]) throws Exception { |
//Preparing the Solr client |
String urlString = "http://localhost:8983/Solr/my_core"; |
SolrClient Solr = new HttpSolrClient.Builder(urlString).build(); |
//Preparing the Solr document |
SolrInputDocument doc = new SolrInputDocument(); |
//Adding fields to the document |
doc.addField("id", "003"); |
doc.addField("name", "Rajaman"); |
doc.addField("age","34"); |
doc.addField("addr","vishakapatnam"); |
//Adding the document to Solr |
Solr.add(doc); |
//Saving the changes |
Solr.commit(); |
System.out.println("Documents added"); |
} |
} |
web3@ubuntu:/usr/local/solr-6.4.0/bin$ javac AddingDocument.java |
web3@ubuntu:/usr/local/solr-6.4.0/bin$ java AddingDocument |
Documents added |