全文搜索属于最常见的需求,开源的 Elasticsearch是目前全文搜索引擎的首选。它可以快速地储存、搜索和分析海量数据。维基百科、Stack Overflow、Github 都采用它。Elasticsearch 的底层是开源库 Lucene。但你没法直接用 Lucene,必须自己写代码去调用它的接口。Elasticsearch是 Lucene 的封装,提供了 REST API 的操作接口,开箱即用。本文从零开始,讲解如何使用 Elasticsearch 搭建自己的全文搜索引擎。
安装JDK 8
Elasticsearch官方建议使用 Oracle的JDK8,在安装之前首先要确定下机器有没有安装JDK。
rpm -qa | grep -E '^open[jre|jdk]|j[re|dk]'
如果有,有可能是系统自带的openjdk,而非oracle的jdk。可以使用 rpm -qa | grep Java | xargs rpm -e --nodeps
批量卸载所有带有Java的文件,然后进行重新安装。
命令行下载 JDK 有个麻烦的地方,必须先要接受 Oracle 的许可协议,不过可以通过设置 cookie 来解决。
wget --no-check-certificate --no-cookies --header "Cookie: oraclelicense=accept-securebackup-cookie" http://download.oracle.com/otn-pub/java/jdk/8u101-b13/jdk-8u101-linux-x64.tar.gz
直接解压 tar -zxvf jdk-8u101-linux-x64.tar.gz
后。配置环境变量:
vi /etc/profile
添加如下内容,并保存:
# set java environment export JAVA_HOME=/usr/local/jdk1.8.0_101 export JRE_HOME=${JAVA_HOME}/jre export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib export PATH=${JAVA_HOME}/bin:${PATH}
保存后运行 source /etc/profile
使环境变量生效
输入java -version 确认是否安装成功。
[root@localhost local]# java -version java version "1.8.0_101" Java(TM) SE Runtime Environment (build 1.8.0_101-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode)
安装 Elasticsearch
手动安装
最简单的方式是通过Yum或rpm的方式进行安装,这里介绍的是手动安装的方法:
1、进入官网查看最新版本的下载链接
2、使用命令行进行下载并解压:
wget https://download.elastic.co/elasticsearch/release/org/elasticsearch/distribution/tar/elasticsearch/2.4.0/elasticsearch-2.4.0.tar.gz tar -zxvf elasticsearch-2.4.0.tar.gz
3、运行Elasticsearch
执行 sh /usr/local/elasticsearch-2.4.0/bin/elasticsearch -d 其中-d表示后台启动
不出意外,可以看到如下报错信息:
[root@localhost bin]# Exception in thread "main" java.lang.RuntimeException: don't run elasticsearch as root. at org.elasticsearch.bootstrap.Bootstrap.initializeNatives(Bootstrap.java:94) at org.elasticsearch.bootstrap.Bootstrap.setup(Bootstrap.java:160) at org.elasticsearch.bootstrap.Bootstrap.init(Bootstrap.java:286) at org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:35) Refer to the log for complete error details.
原因是elasticsearch 默认是不支持用root用户来启动的。
解决方案一:Des.insecure.allow.root=true
修改/usr/local/elasticsearch-2.4.0/bin/elasticsearch,添加ES_JAVA_OPTS=”-Des.insecure.allow.root=true”
或执行时添加:sh /usr/local/elasticsearch-2.4.0/bin/elasticsearch -d -Des.insecure.allow.root=true
注意:正式环境用root运行可能会有安全风险,不建议用root来跑。
解决方案二:添加专门的用户
useradd elastic chown -R elastic:elastic elasticsearch-2.4.0 su elastic sh /usr/local/elasticsearch-2.4.0/bin/elasticsearch -d
使用 curl http://localhost:9200/ 查看是否运行,如果返回如下信息则标示运行正常:
[elastic@localhost local]$ curl http://localhost:9200/ { "name" : "Astrid Bloom", "cluster_name" : "elasticsearch", "version" : { "number" : "2.4.0", "build_hash" : "ce9f0c7394dee074091dd1bc4e9469251181fc55", "build_timestamp" : "2016-08-29T09:14:17Z", "build_snapshot" : false, "lucene_version" : "5.5.2" }, "tagline" : "You Know, for Search" }
Elasticsearch默认restful-api的端口是9200 不支持Ip地址,只能在本机用http://localhost:9200来访问。如果需要改变,需要修改配置文件。
默认情况下 Elasticsearch 的 RESTful 服务只有本机才能访问,也就是说无法从主机访问虚拟机中的服务。为了方便调试,可以修改 /etc/elasticsearch/config/elasticsearch.yml 文件,加入以下两行:
network.bind_host: “0.0.0.0" network.publish_host: _nonloopback:ipv4
或去除network.host 和http.port之前的注释,并将network.host的IP地址修改为本机外网IP。然后重启,Elasticsearch 关闭方法(输入命令:ps -ef | grep elasticsearch ,找到进程,然后kill掉就行了。
如果外网还是不能访问,则有可能是防火墙设置导致的。
使用YUM安装
添加elasticsearch的repo,在/etc/yum.repos.d/下新增elasticsearch.repo:
vi /etc/yum.repos.d/elasticsearch.repo
文件内容如下:
[elasticsearch-5.x] name=Elasticsearch repository for 5.x packages baseurl=https://artifacts.elastic.co/packages/5.x/yum gpgcheck=1 gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch enabled=1 autorefresh=1 type=rpm-md
然后使用最简单的yum命令即可进行安装:
yum install elasticsearch
使用RPM安装
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.1.1.rpm rpm --install elasticsearch-5.1.1.rpm
也是非常的简单。
防火墙设置
关闭selinux
sed -i "s/SELINUX=enforcing/SELINUX=disabled/" /etc/selinux/config setenforce 0
安装firewall
yum install firewalld firewall-config systemctl start firewalld.service systemctl enable firewalld.service systemctl status firewalld.service
开放端口
firewall-cmd --permanent --add-port={9200/tcp,9300/tcp} firewall-cmd --reload firewall-cmd --state firewall-cmd --list-all
至此,elasticsearch就顺利安装完成了,但是为了更好的使用,还需要安装中文分析工具等插件,下篇文章再做介绍。
使用systemd管理elasticsearch服务
/bin/systemctl daemon-reload #重启加载systemd /bin/systemctl enable elasticsearch.service #自启动elasticsearch systemctl start elasticsearch.service #启动elasticsearch systemctl stop elasticsearch.service #停止elasticsearch
Elsticsearch配置说明
如果使用yum或rpm方式安装Elasticsearch默认的配置文件地址是:/etc/elasticsearch/elasticsearch.yml,详细的配置说明可以参考Configuring Elasticsearch。另外还有一个系统配置存放在/etc/sysconfig/elasticsearch,可设置的内容包含如下参数:
ES_USER | The user to run as, defaults to elasticsearch. |
ES_GROUP | The group to run as, defaults to elasticsearch. |
JAVA_HOME | Set a custom Java path to be used. |
MAX_OPEN_FILES | Maximum number of open files, defaults to 65536. |
MAX_LOCKED_MEMORY | Maximum locked memory size. Set to unlimited if you use thebootstrap.memory_lock option in elasticsearch.yml. |
MAX_MAP_COUNT | Maximum number of memory map areas a process may have. If you use mmapfsas index store type, make sure this is set to a high value. For more information, check the linux kernel documentation about max_map_count. This is set via sysctl before starting elasticsearch. Defaults to 262144. |
LOG_DIR | Log directory, defaults to /var/log/elasticsearch. |
DATA_DIR | Data directory, defaults to /var/lib/elasticsearch. |
CONF_DIR | Configuration file directory (which needs to include elasticsearch.ymland log4j2.properties files), defaults to /etc/elasticsearch. |
ES_JAVA_OPTS | Any additional JVM system properties you may want to apply. |
RESTART_ON_UPGRADE | Configure restart on package upgrade, defaults to false. This means you will have to restart your elasticsearch instance after installing a package manually. The reason for this is to ensure, that upgrades in a cluster do not result in a continuous shard reallocation resulting in high network traffic and reducing the response times of your cluster. |
另外,一些默认安装的文件路径如下:
Type | Description | Default Location | Setting |
home | Elasticsearch home directory or $ES_HOME | /usr/share/elasticsearch | |
bin | Binary scripts including elasticsearch to start a node and elasticsearch-plugin to install plugins | /usr/share/elasticsearch/bin | |
conf | Configuration files including elasticsearch.yml | /etc/elasticsearch | path.conf |
conf | Environment variables including heap size, file descriptors. | /etc/sysconfig/elasticsearch | |
data | The location of the data files of each index / shard allocated on the node. Can hold multiple locations. | /var/lib/elasticsearch | path.data |
logs | Log files location. | /var/log/elasticsearch | path.logs |
plugins | Plugin files location. Each plugin will be contained in a subdirectory. | /usr/share/elasticsearch/plugins | |
repo | Shared file system repository locations. Can hold multiple locations. A file system repository can be placed in to any subdirectory of any directory specified here. | Not configured | path.repo |
script | Location of script files. | /etc/elasticsearch/scripts | path.scripts |
参考链接: